What are the optimal parameters for the all.ctrl file in the O2(Orchestra2) cluster?

Evangelos · February 24, 2021, 6:27pm

A. How many ligands per minute or hour do you expect to screen with O2 with flexible and non flexible sidechains?
Could you please let me know what would be the optimal settings for running in O2? In particular.

I used a 12.3 million compounds trance from REAL library and parameters
steps_per_job=1
cpus_per_step=1
queues_per_step=1
cpus_per_queue=20
central_todo_list_splitting_size=10000
ligands_todo_per_queue=1000
ligands_per_refilling_step=10

Is this ok or do you recommend some other values for O2?

Christoph · February 26, 2021, 10:25pm

Dear Evangelos,

with the O2 cluster, it is not obvious what is the best strategy, since there are multiple different queues with different types of allowed jobs/runtimes.

I assume that you will be able to get the most number of CPUs if you use the short queue. See also here more regarding the available queues:
https://wiki.rc.hms.harvard.edu/display/O2/How+to+choose+a+partition+in+O2

The less cpus_per_queue you use, the less cpus per job you use, and thus the more likely it is a job will start because it is more likely that it can fill in/squeeze into open gaps on the compute nodes.
Thus I would recommend to try:

steps_per_job=1
cpus_per_step=1
queues_per_step=1
cpus_per_queue=1
central_todo_list_splitting_size=10000
ligands_todo_per_queue=1000
ligands_per_refilling_step=10

And start hundreds or even over a 1000 jobs in parallel or so. You need to test how well this works currently, as the 02 cluster also is updated continuously.

Regarding the runtime per docking instance, this depends on many different settings, and is different for each case. Flexibility is just one of these settings which play a role in this regard. To find more out about this, you can simply run a docking run on your local computer. This is recommend anyways to check that the docking is working as you expect (i.e. you should verify the results of a few test dockings to make sure the docking settings/scenario works as expected).

Best,
Christoph

Evangelos · March 16, 2021, 12:19pm

Hello Christoph,
Thank you for the response. I used them and tried to run more compounds as short runs. I used a trance with ~600K compounds
I also tried to run 1000 jobs in parallel with
./vf_start_jobline.sh 1 1000 templates/template1.slurm.sh submit 3

But although 1000 jobs were submitted only about 300 were used.
Can you explain this? So virtual flow may not use all the jobs if they are more than necessary? Should then I do a calculation about how many jobs I need first?

Then after 12 hours the jobs stopped but the run was incomplete I believe. Should I set less than 1000 (ligands_todo_per_queue=1000) in order to finish each queue in 12hrs?
Now how can I restart to continue it, will just be enough to?
./vf_start_jobline.sh 1 1000 templates/template1.slurm.sh submit 3

Christoph · March 23, 2021, 7:01pm

Dear Evangelos,

But although 1000 jobs were submitted only about 300 were used. Can you explain this?

I assume that this was because O2 only allowed you to use around 300 CPU cores at the time, because not more were available.

The jobs probably stopped after 12 hours because of the runtime of the jobs you submitted, or because of the queue/partition which you used (e.g. the short partition has 12 hours probably). Less than 1000 ligands doesn’t change much because the average ligand collection size is about 1000, so it doesn’t get any smaller.

If a job ends and the run is not complete, VirtualFlow automatically resubmits successor jobs, so there should be nothing required to be done from your side.

Evangelos · March 23, 2021, 10:18pm

Thank you. That’s probably what happened. (O2 allowed only 300 CPU cores).

The run was only completed after I re run the command
./vf_start_jobline.sh 1 1000 templates/template1.slurm.sh submit 3

Although it took less than 30min to 1 hour to completion. I was surprised. Indeed I used short 12 hours partition. Is there a workaround or will it be better to use medium 5 days partitions? Will that reduce the number of parallelly working available CPUs?

About the other question with the error messages. Does it have to do with the way I installed the VFTools?

And finally do you have a tutorial about how to extract SMILES from the results?
(Fell free to move the questions in more appropriate threads or let me know and I will post them there.)

Thank you!