VFTools failing to generate docking poses from VFVS_GK

I was not able to find the version info of VFVS and VFTools, but I git cloned in the past week (i.e. Oct, 2021)

I am trying to follow the VFVS Tutorial1 (Tutorial1).

The VFVS run completed successfully and I am trying to extract the docking poses of the first 100 top ranks in the The completed workflow

The first step is to run

vfvs_pp_ranking_all.sh ../../output-files/complete/ 2 meta_tranche

from the pp/ranking dir, and it completed successfully.

In the next step created pp/docking_poses/qvina02_rigid_receptor1 dir and from there issued

head -n 100 ../../ranking/qvina02_rigid_receptor1/firstposes.all.minindex.sorted.clean > compounds

Which also completed successfully

$ head -6 compounds
GACEBG_00000 Z2624037004_3 -10.3 1
GACEBG_00000 Z2624037004_4 -10.3 1
GACEBG_00000 Z2087256678_1 -9.8 1
GACEBG_00000 Z2087260951_2 -9.8 1
GACEBG_00000 Z2087260951_4 -9.8 1
GACECF_00000 PV-001701895824_1 -9.8 1

The problem happens in the next step, and I have attached the output of this command

$ vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_collection compounds dockingsposes overwrite
docking_poses_error.txt (53.8 KB)

Studying the vfvs_pp_prepare_dockingposes.sh I hve the following observations:-

Line 49-50:
# Variables
results_folder=${1}

Line 123-131:
elif [ "${format}" == "meta_collection" ]; then
metatranch=${tranch:0:2}
if ! cp ../../../../${results_folder}/${metatranch}/${tranch}/${collection_no}.tar.gz ./; then
if ! cp ../../../..${results_folder/commplete/incomplete}/${metatranch}/${tranch}/${collection_no}.tar.gz ./; then
echo " * Error, skipping this ligand"
cd ../../../../
continue
fi

Are the relative paths in the lines 125 and 126 correct?

Edit: I see in line 84-85

mkdir -p ${output_folder}/${tranch}/${collection_no}/${molecule}
cd ${output_folder}/${tranch}/${collection_no}/${molecule}

which makes the relative paths in line 125 and 126 correct.

But there is no ${results_folder}/${metatranch}/${tranch}/${collection_no}.tar.gz in my case there is only ${results_folder}/${metatranch}/${tranch}.tar.

Hello dmukhop1 and welcome to the VirtualFlow community!

For this purpose, I would recommend you use vfvs_pp_all.sh , part of VFTools.

This script post-processes the virtual screening data, and stores the post-processsed files in the folder pp (postprocessing). More specifically, it does the following:
1) It prepares the full ranking of all docking compounds for each docking scenario, and stores it in the folder pp/firstposes.
2) It exctracts and reformats the docking poses of the best <poses_compound_count> compounds, and stores it in the folder pp/docking_poses.

Kind regards,

Sorin

@Sorin Thanks for your reply, I will try that. Also it will be nice if the instruction in the tutorial works :slightly_smiling_face:.

Ok, looks like there is a typo in the Tutorial1

when the instruction

vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_collection compounds dockingsposes overwrite

was changed to

vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_tranch compounds dockingsposes overwrite

the code completed without issue.

I think I found the issue why the command
vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_collection compounds dockingsposes overwrite was not working.

In the Tutorial1 Bundle (https://virtual-flow.org/sites/virtual-flow.org/files/tutorials/VFVS_GK.tar) linked in the Installation Section of the Tutorial1, the tools/templates/all.ctrl file is missing the following section (which is present in the VFVS cloned from github)

outputfiles_level=collection
# Possible values:
#   * collection  : The collection output files are stored in tar.gz format. They are stored in subfolders named by metatranch and tranch to reduce the number of files per folder.
#                   Advantages:
#                       * Less I/O on the shared cluster file system (as existing tranch archives don't have to be read during storage of completed collectionsds)
#                       * No risk of output-file clashes when two queues want to store completed collections on the shared filesystem
#   * tranche      : For each tranch a tar archive is created, which contains the gzipped collection output files.
#                   Advantages:
#                       * Less output files (only for each tranch) for each of the output file types (e.g. results, summaries, logfiles, ...)

Most likely in the absence of this directive the default outputfile_level=tranche is used and that leads to meta_collection argument to fail in the vf_pp_prepare_dockingposes.sh script, whereas the meta_tranch argument does work.