we conducted a screen using the VF version of Enamine REAL library and would like to back out the original SMILES strings. However, many of the IDs in VF do not match the current REAL library, version 2020q1, which we downloaded from the Enamine website. It looks like the Enamine version in VF is 2018q12. There is also a difference in the suffixes applied to the compound ids, I am assuming we have to discard everything after the first underscore (such as, PV-001912990848_1_T1 -> PV-001912990848)
Examples of IDs not found in Enamine 2020q1:
PV-001912990848, PV-002035615736, …
Is there a way you could make the Enamine REAL db available exactly in the version you downloaded it?
You are right that the version which we currently provide is a 2018 version.
The first suffix enumerates the stereoisomer. The second suffix the tautomerization state. If you get compounds from Enamine, you usually get mixtures. However, for the dockings, the specific stereoisomers and tautomers are docked, since they play a role in docking.
So you as you indicated, you just need to remove the suffixes if you want to get the molecules from Enamine.
In addition to @Christoph response (thanks!), some compounds from the 2018q12 version that absencent in the last REAL library, still could be synthesized by request at Enamine. The contact email for that is libraries@enamine.net . We going to generate models from the newest version soon.
Thanks @Christoph and @Malets for those responses. I believe it would be of tremendous help for us and probably other users as well, if you could make the original 2018 version that you downloaded available in some form, if necessary per-request and also if Enamine agrees. The main hurdle is matching the SMILES with the docked conformations. Please let us know if that is something you think would be feasible.
Do you mean for each docked compound (or compound in the ready-to-dock version of the REAL library, i.e. the VirtualFlow version of the REAL library) you would like to get the original (Enamine) SMILES before the compound was modified during the ligand preparation process?
Yes, for each compound in the ready-to-dock version of REAL… reconstructing those SMILES from the 3d docked conformations is prone to errors and depends on the tool used. E.g. aromaticity information is lost, hydrogens can be missing, and methyl groups turn into C radicals. All this makes entering reconstructed SMILES strings in a database other than Enamine, such as https://mcule.com/search/, very hard. Of course, we can turn to Enamine to help us match those older IDs to SMILES, but it would be great to have the original SMILES available already.
Jens
P.S.: If the compound was modified (stereo-isomers, tautomers)… I don’t know if that can be reflected in an additional SMILES field, as well.
Yes, converting PDBQT files to SMILES is indeed problematic.
This is why in each ligand pdbqt (input) file, the SMILES (of the prepared ligand, i.e. the specific state/stereoisomer) is included as a comment in the header of the file (as a PDB REMARK entry). Since the REMARK entries are retained in the pdbqt docking output files, you can even find them in these files. It seems that this is what you need, is that right?
In addition, as you suggested, we could in principle make available the original SMILES as well (before protonation/tautomerization) as a SMILES library if you think that would be helpful for users of VirtualFlow as well.
Indeed, I now looked at the original inputs which do contain the SMILES string. Thanks for the explanation. These REMARKs were unfortunately lost due to an additional preprocessing step on our side, hence the confusion. However, I think the original SMILES string from Enamine would be very helpful to have available, at least if it increases the chances of finding a molecule that way.
Anybody still getting results from the database?
It seems enamine has updated something and I’m not able to search for any compound, not even old searches in my history pop up anymore.