Multi-stage screen: Ligand selection & preparation

Hi everyone,

I’m almost done with a 62 million ligands screen and I want to do a second round for my top hits using flexible residues for my receptor.
How do I prepare a ligand library for multi-stage screening? You state that you did this in the Nature paper first describing VirtualFlow.

However, I find very little information about this.

Christoph stated the following in the forum:

To prepare the input ligand libraries for the second stage screenings, which contain the top X compounds from the first stage, these need to be prepared manually at this point, because a random selection of the ligands to be screened is not possible at the moment. At the moment, only entire collections of ligands can be screened. We have this on our todo-list for future versions and provide scripts which automate this (if you want to work on this feature, please let us know).

In the Nature paper you did a second screen of 3 million ligands. Clearly you didn’t extract these one by one.

If this needs to be done purely manually I can extract maybe a few thousands, but it would not be great fun. Do you have a recommended protocol?

Hi @Morten ,

I’m glad to hear you were able to almost complete the 62 million screen.

Regarding the preparation of a library for the second stage, what I meant by “manually” is doing it with some custom bash scripts (or similar).

We also provide a basic script in the VF Tools package: vfvs_prepare_newcollections.sh
that you can find here: VFTools/bin at master · VirtualFlow/VFTools · GitHub

You can find some instructions in the file itself, and you can also look at the source code if needed.

I hope this helps,
Christoph

Hi @Christoph

That’s helpful.
I expect to be ready for the second stage in a matter of days, and I’m now sorting the protocol:

How to run the vfvs_prepare_newcollections.sh:

vfvs_prepare_newcollections.sh <ligand file> <pdbqt_input_folder> <pdbqt_folder_format> <ligands_per_collection> <output folder>

Do you have any general guidelines for <ligands_per_collection>?

What about pdbqt_folder_format?
The possible values are tar_tar, meta, sub_tar, and hash_metatranche.
I currently have the library in the following file structure:
.../ligand-library/XX/XXXXXX.tar
Is that tar_tar or sub_tar?

How to make the selection for second stage screening:
I have looked at the tutorial for how to Complete Ligand Ranking.

I have done two docking scenarios, and I have thus two firstposes.all.minindex.sorted.clean files. I now want to make the selection, merge the ligand files, and delete duplicates. The resulting ligand file can then be used as input for vfvs_prepare_newcollections.sh.
I think I have a good way of doing this:

Lets say that I want to extract ligands with estimated affinities greater than -9 kcal / moles, and then write an output file containing the collection name, ligand name, and estimated affinity.

awk -F" " '$4 <= -9 {print $1, $2, $4}' firstposes.all.minindex.sorted.clean > ranked_ligands_1

I then simply use cat to create ligand_merge including both docking scenarios:

cat ranked_ligands_1 ranked_ligands_2 > ligand_merge

Lastly I remove duplicated ligands with:
awk '!a[$2]++' ligand_merge > ligand_merge_rd

Does this seem sensible?

I have been trying:

vfvs_prepare_newcollections.sh /home/rekggla/Scratch/VF_upload/merge_n9_dr /home/rekggla/Scratch/tmp/ligand-library/ tar_tar 1000 /home/rekggla/Scratch/VF_upload/lib_merge_n9_5_dr/

and

vfvs_prepare_newcollections.sh ../../../Scratch/VF_upload/merge_n9_5_dr ../../../Scratch/tmp/ligand-library/ tar_tar 1000 ../../../Scratch/VF_upload/lig_merge

It results in:


              Extracting the winning structrures                 

/home/rekggla/programs/VFTools/bin/vfvs_prepare_newcollections.sh: line 133: …//home/rekggla/Scratch/VF_upload/merge_n9_dr: No such file or directory

*** The preparation of the intermediate folders has been completed ***

*** Starting the preparation of the length.all file ***

  • If the file /home/rekggla/Scratch/VF_upload/lib_merge_n9_5_dr/.length.all exists already it will be cleared.
    ls: cannot access /home/rekggla/Scratch/VF_upload/lib_merge_n9_5_dr/.tmp2: No such file or directory

*** The preparation of the length.all file has been completed ***

*** Starting the preparation of the tar archives ***
/home/rekggla/programs/VFTools/bin/vfvs_prepare_newcollections.sh: line 150: cd: /home/rekggla/Scratch/VF_upload/lib_merge_n9_5_dr/.tmp2: No such file or directory
Error was trapped
Error in bash script vfvs_prepare_newcollections.sh
Error on line 150
Exiting.

I can’t really make sense of this. I see how it reference to line 133 and 150 which would be related to my input library (which is the library I used for my first screen). It also make reference to a .tmp2 that it presumably should create but which fail.
What do I do wrong here?

I have kept on trying, and I have been testing on a computer running Linux.
It is hard to understand all this from what is available as documentation.

I have tried all the different pdbqt_folder_formats because I don’t know which one to use. I used the REAL library, and I prepared it from the VF tutorial. This library is then also used as input for vfvs_prepare_newcollections.sh.

It seems that the script have difficulties finding the paths. The script seems to look here:
ligand-library/ABCDEF.tar

While the actual library has sub-folders:
ligand-library/AB/ABCDEF.tar

Any ideas on how to do this?


./vfvs_prepare_newcollections.sh ../test_firstposes.all.minindex.sorted.clean ligand-library tar_tar 100 lib_merge/


*********************************************************************
                  Extracting the winning structrures                 
*********************************************************************


 *** Adding the ligand XX-XXXXXXXXXXXX_X_XX to the collection XXXXXX_XXXXX-0001 ***
tar: ../../ligand-library/ABCDEF.tar: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
tar: ABCDEF/00000.pdbqt.gz.tar: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

...

 *** The preparation of the intermediate folders has been completed ***

 *** Starting the preparation of the length.all file ***
 * If the file lib_merge/.length.all exists already it will be cleared.

 *** Adding the collection XXXXXX_00000-0001 to the length.all file ***
./vfvs_prepare_newcollections.sh: line 168: ../../lib_merge/.length.all: No such file or directory
Error was trapped
Error in bash script vfvs_prepare_newcollections.sh
Error on line 168
Exiting.

Hello, have you solved this problem? I am also facing the same confusion, thank you very much.

Hi @Morten,

congrats on your 62 mio. screen using VirtualFlow :slight_smile:

I can try to help you with this issue. First of all, the pdbqt_folder_format you have is called “meta” (e.g. AB/ABCDEF.tar). Please also make sure that you use the latest version of the VFTools script from github (VFTools/vfvs_prepare_newcollections.sh at master · VirtualFlow/VFTools · GitHub).

Please try running it this way and let me know if that works. If not, it would be great if you could give more info on your “ligand file” (however, the firstposes.all.minindex.sorted.clean file should be fine). Additionally, more info on your “pdbqt input folder” could be helpful.

All the best
Chris

1 Like

Hi @_Chris_Secker,

Thanks for getting back to me.

I am also use the latest script, and I have now tested with meta. Still no luck.

Here’s an example:

./vfvs_prepare_newcollections.sh ../test_firstposes.all.minindex.sorted.clean ../ligand-library meta 100 lib_merge/

*********************************************************************
                  Extracting the winning structrures                 
*********************************************************************


 *** Adding the ligand PV-002015206817_3_T1 to the collection JBFCEG_00001-0001 ***


 * Extracting collection JBFCEG_00001
tar: ../../ligand-library//JBFCEG.tar: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
./vfvs_prepare_newcollections.sh: line 141: cd: JBFCEG: No such file or directory
Error was trapped
Error in bash script vfvs_prepare_newcollections.sh
Error on line 141
Exiting.

As you can see I’m running the script locally and I have it in the same folder as my input files.
I have tried multiple permutations. I have tried the full paths of input files and folders, or just their folder names. I have to use ../ before my ligand input file for it be read.

I think there’s a confusion with the folder hierarchy, and I can’t see how to fix that.
In the example above it try to access the ligand in /ligand-library//JBFCEG.tar. Is // wrong, should it not be /JB/, like this; /ligand-library/JB/JBFCEG.tar?

PS:
Do you have any tips regarding <ligands_per_collection>? What are sensible guidelines for this number?

Hi @Morten,

thanks for the info. I wonder why there is no metatranche info for the tar command. Can you give me an example line of your ligand-file? Yes, exactly -it should be ligand-library/JB/JBFCEG.tar

Regarding the ligands_per_collection a general recommendation can be 1000 to 10000 for I’d say an average cluster. But it largely depends on the nodes and the config of your slurm cluster and the docking programs/specifications you are using. E.g. if you want one job to work on ~10 collections, you should make sure that the job does not exceed the timelimit on the slurm partition it is running on. How much time the node needs for one ligand to process then also depends on how many cpus the job will use on the node, which docking program you use, what settings of the program you use etc.

Best
Chris

1 Like

Hi @_Chris_Secker,

./vfvs_prepare_newcollections.sh ../test_firstposes.all.minindex.sorted.clean ligand-library meta 100 lib_merge/


*********************************************************************
                  Extracting the winning structrures                 
*********************************************************************


 *** Adding the ligand PV-002015206817_3_T1 to the collection JBFCEG_00001-0001 ***


 * Extracting collection JBFCEG_00001
tar: ../ligand-library//JBFCEG.tar: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
./vfvs_prepare_newcollections.sh: line 141: cd: JBFCEG: No such file or directory
Error was trapped
Error in bash script vfvs_prepare_newcollections.sh
Error on line 141
Exiting.

I tried changing the ligand-library input to absolute path, to …/ligand-library, ligand-library/ etc. In all cases the script fails to execute the intended path.

The tar in the example above is indeed in the library:
ligand-library/JB/JBFCEG.tar

I don’t see how I can change the input library path in such a way that the script reads correctly. Is that possible or is this a bug in the script?

This is the relevant code. I kinda get what is happening, but I’m above all an wet lab scientist and my batch scripting skills are rudimentary.

elif [ "${pdbqt_folder_format}" == "meta" ]; then
    if [ "${new_collection}" == "true" ]; then
        echo
        echo
        echo " * Extracting collection ${collection}"
        rm -r ${old_tranche} &>/dev/null || true
        tar -xf ../${pdbqt_input_folder}/${metatranche}/${tranche}.tar ${tranche}/${collection_no}.tar.gz || true
        cd ${tranche}
        tar -xzf ${collection_no}.tar.gz || true
        cd ..
    fi
    cp ${tranche}/${collection_no}/${ligand}.pdbqt ../${output_folder}.tmp2/${collection_new}/${ligand}.pdbqt || true

This is fixed now.

If you’re interested. I swapped line 86 & 87.

Going from:

metatranche="${tranche:0:2}"
tranche="${collection/_*}"

To:


tranche="${collection/_*}"
metatranche="${tranche:0:2}"
1 Like