Rebalancing ligand collection

I have a ligand collection that is a little bit lumpy. Some subfolders have >1000 compounds, some have only 1. I understand from previous communications that evenly distributed compounds of ~100 or so per folder are ideal for processing.

There files were not generated by VFLP - they are pdbqt files obtained from another source. I have written my own script to reorganize them so that I can control how they are distributed. The files work within virtual-flow, but the throughput time is not good - I think that some processors are getting stuck with 1K compounds while others are waiting after completing their single compound job.

VFVS folder layout requires (so far as I can tell, using A to stand in for A-Z character)


where the subfolders containing 00000 are gzipped and tarred within the AABxxxx folder, all of which sit under the ‘AA’ folder.


Should I redistribute the files?

If so -

For redistributing to even out the distribution.

Is it better to create additional numbered subfolders (eg 000000, 000001, 000002)


different letter name folders (eg AAB1, AAB2, AAB3)?

Thanks in advance!