I am very interested in VFVS and am planning to run it on a 10,000 core HPC system. I did a lot of testing before setting up the final large scale screening and got concerns about the following issues.
- I found that after submitting the vf_start_jobline.sh command and before the real calculating, the whole input-files directory is coppied to /tmp/ of every nodes. I think if one screens a huge library (say the 1.4 billion REAL) on a large number of nodes (e.g. 32 core/node, 312 nodes in total), this would be a pretty high load for the intranet. It also requires a large storage space on /tmp. Since we hope /tmp (or any other VF_TMPDIR) should be on a high-speed storage such as an SSD, this may further increase the expense.
Would it be pretty easy to modify the VirtualFlow scripts so that only a subset of the library is copied to every nodes? Ideally only those compound collections that will be processed on a certain node are copied to that node. Is this prettry easy?
- It seems that /dev/shm/ is created but not really used. All subdirectorie are empty. Since the memory disk is super fast, and nowadays most HPC systems are equiped with large memory, I wonder if we could make better use of /dev/shm. It seems that the AutoDock family softwares themselves do not consume a lot of memory. So to better make use of the large memory disk may be reasonable.
The following questions are not about development of VF, but about the use of VF. If you feel nessesary I may re-post it in another branch of this forum –
As a new user, I have not become familiar with different AutoDock family softwares such as vina, qvina02, smina… and there may be even more. Unfortunately, I did a quick search but did not find a website or a paper to carefully introduce every and all software – it’s likely if I do a lot more search I may find something useful, but you might please simply recommend a literature to me? thanks! As you may understand, before begin a hugh screening, I have to consider the whole strategy. One of the most important question is which software to use. I need to consider the accuracy of the docking and the computing speed, etc. Therefore suggestion from a highly experienced expert like you would be invaluable.
Perhaps a reasonable strategy is to run VFVS for at least two runs? The first run I should do qvina02 to take advantage of its high speed (not relatively low accuracy) and the pick up the top 10 million hits to run VFVS for the second time using something like smina? What’s your suggestion? Thanks!
and how about flexible docking? – similar concern is related to the “exhaustiveness” number in AutoDock configuration file. These may further slow down the calculation and dramatically increase the expense. Do you have any suggestions?
Thank you very much! and finally I would like to say that VirtualFlow is really a very good idea and a very good pakage for people in the drug discovery field! Thank you very much for the wonderful contribution to this field!