Real Library Usage

Is there a way to select a diverse representative subset of the Real Library?

Chuck

Yes, that is possible, by adding one simple additional step during the setup of the virtual screening.

When you go to tranche table to select the part of the library which you are interested, you select the part with the diversity you are interested in. Then you download the collection-length file as usual. It will contain all the collections which are to be screened, such as

CACBDE_00000 
CACBDE_00001 
CACBDE_00002 
CACBDE_00003 
CACBDE_00004
CACBDE_00005

which belong all the tranche CACBDE. Since for each specific tranche there are usually many collections associated (like above 00000 to 00005), you can select a subset of the library without decreasing the diversity but removing some of the collections which belong to the same tranche. In the example above, you could remove the collections

CACBDE_00001 
CACBDE_00002 
CACBDE_00003 
CACBDE_00004
CACBDE_00005

while only retaining collection

CACBDE_00000 

of the tranche CACBDE.

To reduce the size of the screening library while retaining the diversity, you can therefore randomly remove collections from the collections-length file. This can for instance be done by using the command

shuf -n 100 collections.txt > collections_new.txt

where 100 lines were randomly selected and stored in the file collections_new.txt. Now can use the file collections_new.txt as the collections-length file when setting up the screening :slight_smile:

Does this answer your question?

@Chuck There is also a second option. If you go to the tranche browser of the REAL library ( https://virtual-flow.org/real-library ), you can simply narrow down one or two of the descriptors to a very small range. This still gives you a diverse range of compounds regarding the remaining four or five properties.