I have a ligand collection that is a little bit lumpy. Some subfolders have >1000 compounds, some have only 1. I understand from previous communications that evenly distributed compounds of ~100 or so per folder are ideal for processing.
There files were not generated by VFLP - they are pdbqt files obtained from another source. I have written my own script to reorganize them so that I can control how they are distributed. The files work within virtual-flow, but the throughput time is not good - I think that some processors are getting stuck with 1K compounds while others are waiting after completing their single compound job.
VFVS folder layout requires (so far as I can tell, using A to stand in for A-Z character)
AA/AABxxxx/00000
where the subfolders containing 00000 are gzipped and tarred within the AABxxxx folder, all of which sit under the ‘AA’ folder.
Question:
Should I redistribute the files?
If so -
For redistributing to even out the distribution.
Is it better to create additional numbered subfolders (eg 000000, 000001, 000002)
-or-
different letter name folders (eg AAB1, AAB2, AAB3)?
Thanks in advance!