Is there a way to select a diverse representative subset of the Real Library?
Chuck
Is there a way to select a diverse representative subset of the Real Library?
Chuck
Yes, that is possible, by adding one simple additional step during the setup of the virtual screening.
When you go to tranche table to select the part of the library which you are interested, you select the part with the diversity you are interested in. Then you download the collection-length file as usual. It will contain all the collections which are to be screened, such as
CACBDE_00000
CACBDE_00001
CACBDE_00002
CACBDE_00003
CACBDE_00004
CACBDE_00005
which belong all the tranche CACBDE. Since for each specific tranche there are usually many collections associated (like above 00000 to 00005), you can select a subset of the library without decreasing the diversity but removing some of the collections which belong to the same tranche. In the example above, you could remove the collections
CACBDE_00001
CACBDE_00002
CACBDE_00003
CACBDE_00004
CACBDE_00005
while only retaining collection
CACBDE_00000
of the tranche CACBDE.
To reduce the size of the screening library while retaining the diversity, you can therefore randomly remove collections from the collections-length file. This can for instance be done by using the command
shuf -n 100 collections.txt > collections_new.txt
where 100 lines were randomly selected and stored in the file collections_new.txt
. Now can use the file collections_new.txt
as the collections-length file when setting up the screening
Does this answer your question?
@Chuck There is also a second option. If you go to the tranche browser of the REAL library ( https://virtual-flow.org/real-library ), you can simply narrow down one or two of the descriptors to a very small range. This still gives you a diverse range of compounds regarding the remaining four or five properties.
Hi Christoph,
Do you also need to modify tranches.sh file accordingly if you run this to randomly sample from collections?
shuf -n 100 collections.txt > collections_new.txt
thanks
Hi @gal ,
No, you don’t need to modify the tranches.sh file, all you need to do is to prepare the collections file in the way you want it to be.
Best,
Christoph