Virtual screening with VEGA ZZ and GriDock

1 Introduction
2 What's you need
3 NAMD installation
4 Download of the target protein
5 Protein preparation
6 Creation of the input files for GriDock/AutoDock
7 Databases
8 Evaluation of the starting complex
9 Running the screening
10 Results
 

 

1 Introduction

The structure-based virtual screening is a very interesting approach to find new hit compounds from a database of 3D molecules. In this tutorial, it wille explained how to prepare the input files required by GriDock to find potential inhibitors of the HIV-1 protease. 

 

2 What you need

 

3 NAMD installation

If you don't have already installed NAMD, follow these steps, otherwise skip this section.

 

4 Download of the target protein

You can download the HIV-1 protease structure (1HPV) through the PDB Web interface or the tool integrated in VEGA ZZ:

 

5 Protein preparation

It may be possible that the hydrogens are incorrectly added to the co-crystallized ligand due to the protein-specific algorithm and the unusual geometry that the ligand could assume in the binding pocket.

To optimize the structure that we completed adding the hydrogens, atomic charges and the atom potentials must be assigned.

Now we are ready to run the NAMD minimization, but in order to preserve the starting experimental structure, we need to constraint the protein backbone which coordinates will be kept fixed during the calculation.

 

6 Creation of the input files for GriDock/AutoDock

The crystallization water molecules aren't need and can create problems because they are considered fixed and can't be moved by the ligand during the docking calculation.

In order to generate complexes in which the ligand is placed in the same pocket of the co-crystallized one, you need to select the atoms included in a 10 Å sphere around the binding site.

To run virtual screenings with GriDock, the receptor structure must be pre-processed assigning the AMBER atom types, fixing the atom charges, removing the apolar hydrogens and saving the molecule in PDBQT format. All these steps are automatically performed by Docking\AutoDock\Receptor.c script.

 

7 Databases

GriDock requires one or more databases that must manageable by VEGA (SDF or Zip format) and they must contain the 3D structures of the molecules that you want to screen. You can build your own databases using the tools included in VEGA ZZ, you can convert a database from 2D to 3D through the Database 2D to 3D.c script or you can download one from Web sites as Ligand.Info: Small-Molecule Meta-Database (http://ligand.info). In this tutorial, a small database will be downloaded from the that Web site.

If you want to check the database, you can open it by VEGA ZZ:

 

8 Evaluation of the starting complex

In order to compare the screening results with the co-crystallized ligand, it may be interesting to evaluate the complex interaction energy. As first step, you need to extract the ligand from the crystal structure.

WARNING:
Don't normalize the ligand atom coordinates because you need to preserve the starting position.

 

To start the screening on a Windows system:

gridock -t score.dpf 1HPV.pdbqt Ligand.pdbqt

the -t option is used to select another input template file for AutoDock 4. The score.dpf template keeps the starting position and conformation of the ligand and evaluate the interaction energy.  If you need more information about GriDock, please read the manual.

This is an example of the log output:

10:31:39 INIT: GriDock 1.0.0.20 started on Windows
10:31:39 INIT: Local time Mon, 02 Feb 2009 11:31:39
10:31:39 INIT: Cpu model: AMD Opteron(tm) Processor 250
10:31:39 INIT: CPUs/Cores detected: 2
10:31:39 INIT: CPUs/Cores used: 1
10:31:39 INIT: AutoDock/VEGA directory: "D:\Documenti\Lcc\Vega"
10:31:39 INIT: AutoDock executable: "D:\Documenti\Lcc\Vega\AutoDock4.exe"
10:31:39 INIT: VEGA executable: "D:\Documenti\Lcc\Vega\Vega.exe"
10:31:39 INIT: Receptor file: "1HPV.pdbqt"
10:31:39 INIT: Database file: "Ligand.pdbqt"
10:31:39 INIT: AutoDock template file: "score.dpf"
10:31:39 INIT: AutoDock output archive: "1HPV-Ligand_01.zip"
10:31:39 INIT: Max. size of AutoDock output archive: 4000000000 bytes
10:31:39 INIT: Energy output file: "1HPV-Ligand.csv"
10:31:39 INIT: Temporary file directory: "C:\DOCUME~1\ALESSA~1\IMPOST~1\Temp"
10:31:39 INIT: First molecule to dock: 1
10:31:39 INIT: Input database in PDBQT format: one molecule only will be docked
10:31:39 INIT: AMMP time-out: 120 sec.
10:31:39 INIT: AutoDock time-out: 12000 sec.
10:31:39 INIT: VEGA time-out: 120 sec.
10:31:39 INFO: Starting AutoDock - Molecule 1 (Ligand)
10:31:41 INFO: Molecule 1 - Docking finished (0m 2s)
10:31:41 DOCK: Molecule 1 - Best model 1, Best Binding energy = -8.29 kcal/mol, Ki = 838.53 uM
10:31:41 INFO: End of calculation
10:31:41 INFO: Docked molecules 1
10:31:41 INFO: Elapsed time 0h 0m 2s

The DOCK tag (shown in red) contains the information about the binding quality (binding energy and Ki). You can find the same information in the 1HPV_Ligand.csv file, but it's formatted to be managed by a spreadsheet (e.g. Microsoft Excel):

Database; MolID; Pose; Ki; Binding; Intermolecular; VdW + Hbond + Desolv; Electrostatic; Internal; Torsional; Unbound; Molecule name
Ligand; 1; 1; 838,53; -8,29; -9,99; -9,34; -0,65; -1,32; 3,02; 0,00; Ligand

If you want check the resulting complex:

 

9 Running the screening

Now in the Screening directory, all files required by the screening are present:

  1. The receptor file in PDBQT format (1HPV.pdbqt).
  2. The map files used by AutoDock to evaluate the ligand-receptor interaction energy (1HPV.*.map and 1HPV.maps.*).
  3. The database containing the 3D structures of the molecules to screen (ChemBank.sdf).
gridock 1HPV.pdbqt ChemBank.sdf

and hit return. When GriDock starts, the number of the installed CPUs/Cores is detected and used assigning a working thread. The time required to screen all 2,344 molecules contained in the ChemBank.sdf database depends on the computational power of your system. It's possible to check the calculation progress, viewing the log file (gridock_YYYYMMDD.log) with a text editor.

gridock -f 10 -l 200 1HPV.pdbqt ChemBank.sdf

the molecules in the range from 10 to 200 will be screened only. If you want start from the first molecule, the -f option can be omitted.

gridock -l 200 1HPV.pdbqt ChemBank.sdf

For time reasons, you can screen the first 200 molecules in the ChemBank database, running GriDock with the syntax shown in the previous line.

 

10 Results

You can consider ligands as potential HIV-1 protease inhibitors when the binding energy and/or the binding constants (Ki) are respectively less than -8.29 kcal/mol and 838,53 uM.
Considering the molecules from 1 to 200 in the ChemBank database and analyzing the 1HPV-ChemBank.csv file, you can obtain these results:

Database; MolID; Pose; Ki; Binding; Intermolecular; VdW + Hbond + Desolv; Electrostatic; Internal; Torsional; Unbound; Molecule name
ChemBank; 123; 9; 72,55; -9,74; -9,74; -9,51; -0,23; 0,00; 0,00; 0,00; Paxilline
ChemBank; 101; 6; 282,90; -8,93; -9,83; -9,87; 0,03; -0,51; 0,82; -0,59; 24,25-Dihydroxyvitamin D3
ChemBank; 75; 4; 545,05; -8,54; -8,54; -8,40; -0,15; 0,00; 0,00; 0,00; Grayanotoxin III
ChemBank; 151; 7; 793,77; -8,32; -8,84; -8,82; -0,02; -0,30; 0,55; -0,27; Go6976
ChemBank; 17; 4; 796,25; -8,32; -9,10; -9,04; -0,06; -0,82; 1,10; -0,50; WIN 55, 212-2
ChemBank; 97; 4; 1,06; -8,15; -8,88; -8,88; -0,01; -0,68; 0,82; -0,59; 25-Hydroxyvitamin D3
ChemBank; 133; 4; 1,21; -8,07; -8,71; -7,72; -0,99; -0,33; 0,55; -0,42; TTNPB
...

The 1HPV-ChemBank.csv output file can be also analyzed in easy way opening it by Microsoft Excel or any other spreadsheet software able to manage the comma separated value files (CSV format). The Windows version of GriDock detects the locale settings and writes the output with the required decimal separator in order to allow to open the file directly without translations.

In red are shown complexes with binding energy lower -8,29 kcal/mol and they could be potential HIV-1 protease inhibitors. Please remember that the results are reported for exemplificative purposes and with most probability they are unusable in the real biological environment (e.g. Paxilline is a toxin produced by Penicillum paxilli, Grayanotoxin III is toxic diterpenoids, etc).

 

WARNING:
Due to the stochastic nature of the genetic algorithms implemented in AutoDock 4, the results that you can obtain, could differ from the values reported in this tutorial.