Optimal high performance computing facility setup

Hello community :slight_smile:

I have been granted 10.000 cores for a total of 500.000CBU time. If im correct this means that I am able to use virtualflow for 50 hours long using 10.000 cores (500000/10000). This means I’m not able to screen 1 billion compounds but less. In the publication Virtualflow author’s mentioned that leveraging 10.000 cores would be able to screen 1 billion compounds in 336 hours, for my 50 hours this means I can roughly screen 140 million compounds.

Now my question is, if I screen 140million compounds with 10000 core what would be the optimal paramaters in the ctrl file for such setup e.g. steps_per_job=1 , cpus_per_step=1 , queues_per_step=1 or cpus_per_queue=1

In additon, how many jobs would be proper to use in ./vf_start_jobline.sh 1 10 templates/template1.slurm.sh submit 1

Let me know what you think!

1 Like

Hi Kaneki,

Welcome back, and congratulations on your obtained computation time :slight_smile:

VirtualFlow is quite flexible regarding these settings (to be able to run on any HPC system). The optimal settings will depend on the precise HPC which you are using.

If for example your HPC system always allocates full compute nodes to users/jobs, then I would set cpus_per_step and queues_per_step to the number of cores per compute node. steps_per_job I would for such an HPC system set to something like 10, meaning 10 nodes per job, and cpus_per_queue=1 is always recommended. So if there are 32 cores for instance per node, then one job (with 10 nodes per job) would use a total of 320 cores. Thus if you want to use 10000 in parallel in this case, you would need to submit around 31 jobs.

The number of compounds you can screen with your computation time will depend also on the processor speed. Maybe your CPUs are faster then the ones which were used for the publication :slight_smile:

Thanks for your ellaborate answer. That makes sense. However, where do you get the 31 jobs from?

Here is my calculation: (10000 cores in total)/((32 cores per node)x(10 nodes per job))=31.25 :slight_smile:

Number of nodes Cores
177 32
1080 24
540 24
32 32
64 16
18 64
steps_per_job: 10
cpus_per_step: 24
queues_per_step: 24
cpus_per_queue: 1

(10000 cores in total)/((24 cores per node)x(10 nodes per job)) = 41.66 jobs to be submitted

./vf_start_jobline.sh 1 41 templates/template1.slurm.sh submit 1

I assume this would be good then?
I want to screen 150 million compounds, do I have to change the ctrl file in terms of:

central_todo_list_splitting_size=10000
ligands_todo_per_queue=1000
ligands_per_refilling_step=100