Included scripts

13.3 Scripts

13.3.1 Introduction

The VEGA ZZ package includes several scripts, which are placed in ...\VEGA ZZ\Scripts directory with the following sub-directory structure:

Scripts
	_Templates (hidden folder)
	ADMET
	Ammp
	Build
	Calculation
	Color
	Common
	Communication
	Database
	Development tools
	DNA tools
	Docking
		AutoDock
		PLANTS
		Vina
		Other docking scripts
	Examples
	File conversion
	Interaction surface
	Movie
	Protein tools
		Homology modelling services
	PubChem
		Database rename
		Download
		Get
	QSAR
		Linear regression
		Virtual screening
	Trajectory
	Utilities

13.3.2 _Templates

This folder contains the templates used when a new script is created. It is not shown in the script three, but it is available selecting Help Explore data directory in the Scripts folder.

OpenGL.c	Template for OpenGL C scripts.

Rebol.r	Template for REBOL scripts.

Stabdard.c	Template for standard C scripts.

Window API.c	Template window with Close button (Windows API version).

Window GraphApp.c	Template window with Ok button (GraphApp GUI version).

Window GraphApp Calc.c	Template window with Calculate button. Clicking this button, the main window hides and the abort dialog is shown. Pressing its Abort button, the calculation is stopped. This script template requires the GarphApp GUI.

13.3.3 ADMET

This directory includes scripts for ADMET (adsorption, distribution, metabolism, elimination and toxicity) prediction.

BBB permeation predictor.c

This script was generated automatically by Tree2C and performs the classification of molecules between permeant and not permeant of blood-brain barrier (BBB) through a decision tree. For more information, click here.

MetaClass builder.c

MetaClass predictor.vll

MetaClass is a comprehensive classification system for the prediction of the metabolic reactions of a given molecule or of a set of molecules. The prediction is based on a machine-learning algorithm (Random Forest), which was trained by using the metabolic data collected and classified into the MetaQSAR database. MetaClass package includes two modules: MetaClass builder and MetaClass predictor. The former can generate automatically not only the models but also code, which can be used directly in the VEGA ZZ environment for the prediction, while the latter is the result of MetaClass builder run.

Software requirements for MetaClass predictor:

MOPAC 2016 (freely available after registration at http://openmopac.net). For the version selection and the right installation, you must consult click here.

Software requirements for MetaClass builder:

The same packages required for MetaClass predictor.
Weka 3.8 or greater (freely available at https://www.cs.waikato.ac.nz/ml/weka/index.html).
Java (freely available at https://www.java.com, required to run Weka).
Tree2C (included in VEGA ZZ package).
Tcc C compiler (included in VEGA ZZ package).
A MetaQSAR database in the supported format (ODBC data source, SQLite and Microsoft Access).

MetaClass predictor

The MetaClass predictor is a compiled C-Script (available for both x64 and x86 architectures), which allows the prediction of metabolic reactions which a given molecule undergoes according to MetaQSAR rules. To run it, you must start VEGA ZZ, select File → Run script in the main menu, expand the ADMET branch and double click MetaClass predictor.vll. If a molecule is present in the current workspace, the prediction is performed for a single molecule and the results are shown in the VEGA ZZ console. If the workspace is empty, a file requester is shown to select an input database (it must be in a format supported by VEGA ZZ: Microsoft Access, Mol2, ODBC data source, SDF, SMILES, SQLite and Zip). In this second case, the prediction is performed for all molecules in the database and the results are saved into a CSV file. Since the training set used by the learning phase (the substrates classified in the MetaQSAR database) includes only molecules in non-ionized form, with the exception of quaternary ammonium salts, the molecules for which you want to predict the metabolic reactions must also be in their neutral form. Here is the typical output shown by the VEGA ZZ console for a single molecule prediction (eg. Naproxen):

* Prediction of MetaQSAR metabolic reactions
* Assigning atom charges
  Total charge: 0.00

   Reaction Substrate Dom. viol.
  ======================================================================
   01 - Oxidation of Csp3 Yes 0
   02 - Oxidation of Csp2 & Csp Yes 0
   03 - -CHOH <-> >C=O -> -COOH Yes 0
   04 - Various redox reactions of carbon atoms No 0
   05 - Redox reactions of R3N No 0
   06 - Oxidation of >NH, >NOH and -N=O // Reduc Yes 0
   07 - Oxidation to quinones or analogs // Redu Yes 0
   08 - Oxidation and reduction of S atoms No 0
   09 - Redox reactions of other atoms Yes 0
   11 - Hydrolysis of esters, lactones and inorg Yes 0
   12 - Hydrolysis of amides, lactams and peptid No 0
   13 - Epoxide hydration No 0
   14 - Other hydrolysis/hydration reactions // No 0
   21 - O-Glucuronidations & glycosylations No 0
   22 - N- and S-Glucuronidations // All other g No 0
   23 - Sulfonations (O-, N-, ...) Yes 0
   24 - GSH & RSH conjugations + sequels // GSH- No 0
   25 - Acetylations & acylations No 0
   26 - CoASH-Ligation followed by amino acid co Yes 0
   27 - Methylations (O-, N-, S-) No 0
   28 - Other conjugations (PO4, CO2, ...) // Tr No 0

For each reaction class (according to the metabolic reaction classification in MetaQSAR), the output table shows the code and the description (Reaction column), if the molecule is substrate or not (Substrate column) and the number of the domain violations (Dom. viol. column). This counter indicates how many parameters/attributes are out of the range of the property space of the training set. If this value is not zero, the prediction might be less accurate.
If the input is a database of molecule, the output is saved to a CSV file thanks to a file requester, which allows you to choose the file name. The prediction can be aborted in any time just clicking the Abort button shown in the progress window. The output file includes a column with the name of the molecule for which the prediction is given, and two columns for each reaction class reporting respectively the prediction (1 if the molecule is substrate, 0 if it is not) and the number of domain violations.

MetaClass builder
As explained above, MetaClass builder can generate the predictive models based on MetaQSAR dataset as well as the C-Script code required to build the MetaClass predictor. To run it, you must start VEGA ZZ, select File → Run script in the main menu, expand the ADMET branch and double click MetaClass builder.c. The only input required by the script is a database, which must be in a format supported by VEGA ZZ (ODBC data source, SQLite and Microsoft Access).
MetaClass builder modifies the database structure by adding three new tables, which include the descriptors/attributes calculated with MOPAC and Kier-Hall approach (namely Prop_Mopac, Prop_Mopac_Atom and Prop_KierHall). Therefore, if you want to keep the original structure, you must create a work-copy of the database. Moreover, if the script finds these tables, the calculation of the descriptors is not performed so speeding-up the whole process. If you want to force the descriptors calculation, you should delete these three tables from the database.

How MetaClass builder works
MetaClass builder uses a multi-step approach to build the final models, which are compiled as link-libraries that can be used directly in VEGA ZZ environment as shown here:

Checking the required programs/components. If one of them is missing, the script aborts.
Checking if the input file/data source (specified by the file requester) is a MetaQSAR-compatible database.
Extracting the reaction classes for which the predictive models will be developed.
Calculating and storing into the same database the Kier-Hall descriptors if required.
Calculating and storing the MOPAC descriptors if required by applying the following keywords: PM7 GEO-OK MMOK 1SCF SUPER THREADS=1. If your molecules are not optimized by MOPAC, you must delete the 1SCF keyword from VGS_MOPAC_KEYS definition of the script.
Extracting the VEGA-based molecular descriptors from the database, which are calculated by MetaQSAR when the user compile it.

For each reaction class:
Building for each class a balanced dataset by selecting all substrates of the reaction class and an equal number of non-substrates (which are chosen randomly).
Creating the input file for Weka (in ARFF format) by considering the most significant attributes previously chosen by Select attributes tool implemented in Weka (see ADMET\_MetaClass builder\*.txt files) according to the BestFirst search algorithm (direction = Forward; lookupCacheSize = 1; searchTermination = 5) and the WrapperSubsetEval attribute evaluator (classifier = RandomForest with default settings; doNotCheckCapabilities = False; evaluationMeasure = accuracy, RMSE; folds = 5; seed = 1; threshold = 0.01).
Running Weka to build the models by Random Forest algorithm with the default parameters, which are: weka.classifiers.trees.RandomForest -P 100 -print -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1.
Checking if the Weka calculation is completed without errors and reads the output file to extract the statistical data.
Running Tree2C to convert the decision trees generated by Weka into C-Script code.
Compiling the C code (for both x64 and x86 versions) into the object file by using Tcc.

For all reaction classes:
Creating the configuration header file of the main code according the data collected during the generation of the models.
Copying the main code template (main.c) from ADMET\_MetaClass builder directory to the working directory, compiling and linking it by Tcc with the object files created for each reaction class.
Installing the resulting compiled scripts (.vll and .vl1) into the ADMET scripts directory of VEGA ZZ program.
Cleaning the working directory if VGS_CLEANUP macro is defined in the code of the script (by default this is not performed).

The MetaClass builder generates both x64 and x86 versions of the MetaClass predictor only if both VEGA ZZ x86 and x64 are installed. Usually, only one of the two versions is installed according to your operating system, but you can override this behaviour during the VEGA ZZ setup by choosing the installation of the Live CD creator component.
The working directory is the same in which the MetaQSAR database file is placed and here several intermediate files are saved. In detail, you can find:

config.h: the configuration header of the main part of MetaClass predictor (main.c).
DATABASE_NAME - Model performances.csv: this file includes several statistical data of the models created by Weka (see below).
main.c: the main code of MetaClass predictor.
main_32.o: the object file compiled by Tcc from main.c (x86 version).
main_64.o: the object file compiled by Tcc from main.c (x64 version).
MetaClass predictor.vll: the final script compiled and linked by Tcc (x86 version).
MetaClass predictor.vl1: the final script compiled and linked by Tcc (x64 version).

For each reaction class you can find:

DATABASE_NAME – REACTION_CODE.arff: the Weka input file in ARFF format.
DATABASE_NAME – REACTION_CODE.txt: the Weka output file with the trees in text format.
model_REACTION_CODE.c: the source code of the decision tree translated by Tree2C.
model_REACTION_CODE.o: the object file of the decision tree compiled by Tcc (x86 version).
model_REACTION_CODE_32.o: the object file of the decision tree compiled by Tcc (x86 version).
model_REACTION_CODE_64.o: the object file of the decision tree compiled by Tcc (x64 version).

As explained above, DATABASE_NAME - Model performances.csv includes several statistical data about the performances of the models built by Weka:

Column	Description
Class code	Reaction class code
Description	Description of the metabolic reaction
Attributes	Number of the attributes used to build the model
Non-substrates_(0)	Number of non-substrates (0 class)
Substrates_(1)	Number of substrates (1 class)
Correctly_classified	Number of molecules correctly classified
Correctly_classified_%	Percentage of molecules correctly classified
Incorrectly classified	Number of molecules incorrectly classified
Incorrectly classified_%	Percentage of molecules incorrectly classified
Kappa	Kappa statistic
MAE	Mean absolute error
RMSE	Root mean squared error
RAE	Relative absolute error
RRSE	Root relative squared error
TP_Rate_0	True positive rate for non-substrates (0 class)
FP_Rate_0	False positive rate for non-substrates (0 class)
Precision_0	Precision for non-substrates (0 class)
Recall_0	Recall for non-substrates (0 class)
F-Measure_0	F-Measure for non-substrates (0 class)
MCC_0	Matthews correlation coefficient for non-substrates (0 class)
ROC_Area_0	ROC area for non-substrates (0 class)
PRC_Area_0	PRC area for non-substrates (0 class)
TP_Rate_1	True positive rate for substrates (1 class)
FP_Rate_1	False positive rate for substrates (1 class)
Precision_1	Precision for substrates (1 class)
Recall_1	Recall for substrates (1 class)
F-Measure_1	F-Measure for substrates (1 class)
MCC_1	Matthews correlation coefficient for substrates (1 class)
ROC_Area_1	ROC area for substrates (1 class)
PRC_Area_1	PRC area for substrates (1 class)
TP_Rate_WA	Weighted average of true positive rate
FP_Rate_WA	Weighted average of false positive rate
Precision_WA	Weighted average of precision
Recall_WA	Weighted average of recall
F-Measure_WA	Weighted average of F-Measure
MCC_WA	Weighted average of Matthews correlation coefficient
ROC_Area_WA	Weighted average of ROC area
PRC_Area_WA	Weighted average of PRC area

Mutagenicity predictor.c

This script was generated automatically by Tree2C and performs the classification of molecules between mutagen and not mutagen through a decision tree. For more information, click here.

13.3.4 Ammp

The scripts included in this directory, are useful to control some AMMP jobs in automatic way.

Automatic Boltzmann jump.c	This script performs a conformational analysis of the current molecule in the workspace by Boltzmann jump algorithm. More in detail, it generates 1000 conformations at 500 K temperature and each of them is minimized by conjugate gradients algorithm (3000 steps, 0.01 RMS). The conformations are automatically saved into a DCD trajectory. After the conformational search, it is also performed a cluster analysis in order to discard the redundant conformations and to keep only the most significant conformers (one for each cluster). In particular, the conformations whose the differnce of the average value of the flexible torsion angles is no more than 60 degrees are included in the same cluster. Two files are automatically generated: a trajectory file (* - clust.dcd) with the best conformers of each cluster and a text file (* - clust.ene) with the energy of the best conformer and the number of conformers of each cluster. All calculation parameters can be set by the user changing the parameters at the beginning of the script source code. For more information on the conformational search, click here.

Dipole.c	It calculates the dipole momentum by AMMP. If the charges aren't assigned, they are fixed by Gasteiger - Marsili method (see AMMP's DIPOLE command).

Interaction analysis.c	It evaluates the non-bond interaction energy between two molecules. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. For more information, see ANALYZE command in AMMP manual. The results are shown in VEGA ZZ console. AMMP shows the energy for each atom in the selection range: Vnonbon internal lys.n 137 Eq -12.860423 E6 -1.397398 E12 2.191076 Vnonbon external lys.n 137 Eq 16.632879 E6 -5.806829 E12 9.177713 Vnonbon total lys.n 137 Eq 3.772456 E6 -7.204227 E12 11.368790 where internal is intramolecular energy, external is the intermolecular (interaction) energy, total is the sum of intramolecular and intermolecular energies, Eq is the electrostatic (coulombic) energy, E6 and E12 are the Lennard - Johnes terms. At the end of the atom dump, AMMP shows also: Vnonbon total internal 151.439880 Vnonbon total external 2.272158 Vnonbon total 153.712067 153.712067 non-bonded energy 153.712067 total potential energy where Vnonbon total internal is the total intramolecular energy, Vnonbon total external is the total intermolecular (interaction) energy, Vnonbon total is the total non-bond interaction energy (it's the sum of Vnonbon total internal and Vnonbon total external). Non-bonded energy and total potential energy are self explaining. Finally, the results (Vnonbond total internal, Vnonbond external and Vnonbond total) are copied to the clipboard.

Neural network.c	The AMMP's Kohonen neural network is used to find the 3D space filling curve corresponding to the structure. If the charges aren't assigned, they are fixed by Gasteiger - Marsili method (see AMMP's KOHONEN command).

Rigid docking.c	It performs a rigid rocking calculation by genetic algorithm as implemented in AMMP program. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. This last molecule is moved to obtain the complex. Both molecules must have the hydrogens and the charges are automatically fixed (Gasteiger - Marsili method) if they are unassigned. This script has a graphic user interface (provided by GraphApp library) and to understand the meaning of each field, it's strongly recommended to read the GDOCK documentation.

13.3.5 Build

By these scripts, it's possible to build complex structures:

Aromaticity fix.c	It fixes the bond order in aromatic rings, changing the alternated single and double bonds to partial double bonds.

Coordinate transformation.c	This script applies the specified transformation matrix to all atoms or to visible/active atoms only (see Active atoms only checlbox). It's useful to build multimeric structures from the information included in the REMARK 300 and 350 tags of PDB files. REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 THIS ENTRY CONTAINS THE CRYSTALLOGRAPHIC ASYMMETRIC UNIT REMARK 300 WHICH CONSISTS OF 2 CHAIN(S). SEE REMARK 350 FOR REMARK 300 INFORMATION ON GENERATING THE BIOLOGICAL MOLECULE(S). REMARK 350 REMARK 350 GENERATING THE BIOMOLECULE REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 APPLY THE FOLLOWING TO CHAINS: B, A REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 2 -1.000000 0.000000 0.000000 174.00000 REMARK 350 BIOMT2 2 0.000000 -1.000000 0.000000 174.00000 REMARK 350 BIOMT3 2 0.000000 0.000000 1.000000 0.00000 To build this homodimeric macromolecule: Open the original PDB file. Run Coordinate transformation script, Put in the dialog window the values shown in red. Click Apply button. Reopen the original PDB file in the same workspace of the transformed structure and click Append in the dialog window.

Graphite.r	This script build one or more graphite planes.

Nanotube.r	This scripts build a single-walled carbon nanotube (SWCNT) structures. It's based on VBS code developed by Roberto G. A. Veiga at Instituto de Física - Universidade Federal de Uberlândia (UFU) - Brazil, using the algorithm described in the article: White et al., Phys. Rev. B, 1993, Vol. 47, No. 9, pp. 5485-88.

Peptide library.c	By this script, you can build a peptide library of a given length (Peptide length) starting from a set of residues (Residues to use, usually the 20 natural aminoacid). Optionally, you can indicate one or more base peptide to which the residues are added to the C-terminal side. Each base peptide (Base peptides field) must be separated by spaces, commas, semicolons and tabs. The peptides are built as beta-sheet in zwitterionic form and with the side chains ionized according to the physiological pH. They are stored in a database (see Output box, Database) in any format supported by VEGA ZZ.

Protein mutagenesis.c	This script generates mutated proteins from a template structure and a list of mutations. As first step, it ask if you want to perform all possible permutation of the mutation or only one mutation for each column of the mutation file. The output molecules are stored in a database and an additional CSV file is also generated, containing the molecule names and their aminoacid sequence. The file of the mutation list must include one mutation for each line in the following format: ResName:ResNum:ChainID:MolNum List_Of_Aminoacids where ResName is the name of the residue (max. 4 characters), ResNum is the residue number (max. 4 characters), ChainID is the chain identifier (1 character), MolNum is the molecule number (non zero, unsigned integer) and List_Of_Aminoacids is the list of the aminoacids that will be sequentially replaced (max. 20 characters, aminocid single character code). ChainID and MolNum are optional parameters, but it you want to specify the molecule number without to indicate the chain, you can use * as ID. # and ; at the beginning of each line can be used for remarks. Example: ; Mutation list example THR:3 EYF SER:6:Y AL It generates 6 mutants, involving the residues in 3 and 6: EA, YA, FA, EL, YL and FL. Each mutated protein is automatically minimized by NAMD 2 (5000 steps), keeping the backbone fixed. WARNING: To run this script, NAMD 2 package and parm.prm parameter file must be installed. For more detail, click here and here.

Protonation fix.c	By this script, you can fix the protonation state of the molecule in the current workspace, removing the acid hydrogens (bonded to carboxylate, solphonate, phosphite and phosphate groups) and adding the basic hydrogens (to nitrogens of primary amines and guanidines).

Solvent cluster racemizer.c	This script creates a racemic mixture from a solvent cluster of chiral molecules built from a single enantiomer. The solvent cluster must be opened in the current workspace.

Stereoisomers.c	This script builds all possible stereoisomers from a chiral molecule that must be opened in the current workspace. Diastereoisomers are automatically minimized (conjugate gradients, 3000 steps, toler 0.01). For security reasons, the maximum number of chiral centers is limited to 8 (2⁸ = 256 stereoisomers), but it can be incresed to 32 changing the VGP_MAX_CHIRAL_CENTERS and VGP_MAX_CHIRAL_CENTERSSTR definitions. When you start the script, a file requester is show in which you can put the output format and the file name that is used as prefix, because each stereoisomer is named adding the configuration of all stereocenters. You must remember that if the bond order of the starting molecule is assigned in wrong way, the chirality attribution could be incorrect (according to Cahn-Ingold and Prelog rules).

Zero coord.c	It moves the atoms at the specified coordinates. Checking Active atoms only, only the visible atoms are moved.

13.3.6 Calculation

This directory includes scripts for generic calculations:

APBS membrane energy.c

This script evaluates the energy required by a molecule to leave the hydration shell and to reach a biological membrane. This calculation is performed by APBS and both solvents are implicity defined by their dielectric constants (78.0 for water and 9.0 for membrane).
This script uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst.

For more information about APBS, visit http://www.poissonboltzmann.org/apbs/

APBS solvation energy.c

It calculates the solvation energy of the molecule in the current workspace by APBS. The results are shown in VEGA ZZ console and copied to the clipboard.
This script uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst.

For more information about APBS, visit http://www.poissonboltzmann.org/apbs/

Copy properties.c

This script copies some molecular properties to the clipboard in selective mode.

Database properties.c

This script calculates several properties for all molecules included in a database as 3D structures. The script asks for a database in one of the VEGA ZZ supported formats as input and for a CSV file as output. During the calculation, a log file is also created in which all errors are recorded.
This script is especially useful for that database formats that don't include molecular properties such as Mol2, Sdf and Zip.

Druglikeness.c

By this script, you can check the druglikeness of the molecule in the current workspace. Two methods are used:

Lipinski's rule of five
This rule establishes that an orally active drug must have:

not more than 5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms);
not more than 10 (2 x 5) hydrogen bond acceptors (nitrogen or oxygen atoms) ;
a molecular weight under 500 g/mol ;
a partition coefficient logP less than 5.

Ghose's rule
This rule establishes that an orally active drug must have:

partition coefficient logP in -0.4 to 5.6 range;
molar refractivity from 40 to 130;
molecular weight from 160 to 480;
number of heavy atoms from 20 to 70.

The molecular refractivity is calculated according to the Ghose and Crippen method.

References:
Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J.
"Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings"
AdV. Drug DeliVery ReV. 1997, 23,3-25.

Ghose, A. K.; Viswanadhan V. N.; Wendoloski, J.J.
"A Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug Databases"
J. Comb. Chem. 1999, 1, 55 68.

Elecrostatic energy.c

It evaluates the electrostatic energy of the molecule in the current workspace. The default dielectric constant is 1 (vacuum).

Log kw IAM.MG/DD2.c

Since the scale of log k_w^IAM values was frequently found to better mimic the drug/membrane interactions actually occurring in vivo than lipophilicity in n-octanol, this script implements a method to predict the k_w^IAM for both MG and DD2 chromatographic columns. In particular, you can estimate the retention time as log k_w values for a molecule in the current workspace or, alternatively, for any molecule in PubChem database. The results can be copied to the clipboard and if the descriptors used for the prediction or the calculated log k_w is out of prediction domain, warning messages are shown in the console.

You can choose between two prediction methods that use two different approaches to predict log P: the former, more accurate, is based on miLogP and requires to send the data to Molinspiration software and the latter, less accurate, is based on virtual log P and runs off-line. If you have to manage sensible data that you don't want to share on the Web, you should choose the second method.

The predictions miLogP-based exploit these two correlative equations:

log k_w^IAM.MG = -0.1405 + 0.4401 miLogP + 0.0536 HeavyAtoms - 0.0833 HLB_M - 0.0435 FlexDihedrals
n = 204 r² = 0.81 q² = 0.80 SE = 0.438 F = 213.92 P < 1.0 10-8 PC = 39.403

log k_w^IAM.DD2 = -2.3989 + 0.4936 miLogP + 0.4354 Vdiam - 0.0640 HLB_PSA - 0.0497 FlexDihedrals
n = 160 r² = 0.85 q² = 0.84 SE = 0.459 F = 212.94 P < 1.0 10-8 PC = 33.974

where:

FlexDihedrals	= number of flexible dihedral angles;
HeavyAtoms	= number of heavy atoms;
HLB_M	= hydrophilic-lipophylic balance (HLB) as mean of HLB_D, HLB_G and HLB_PSA;
HLB_PSA	= hydrophilic-lipophylic balance (HLB) calculated as ratio between PSA and total surface;
Vdiam	= volume diameter in Å.

The predictions virtual log P-based exploit these other correlative equations:

log k_w^IAM.MG = -0.3867 + 0.4159 VirtualLogP + 0.0741 HeavyAtoms - 0.0806 HLB_G - 0.0657 FlexDihedrals

n = 205 r² = 0.75 q² = 0.74 SE = 0.501 F = 151.79 P < 1.0 10-8 PC = 51.739

log k_w^IAM.DD2 = -3.0812 + 0.4809 VirtualLogP + 0.5464 Vdiam - 0.0765 HLB_PSA - 0.0829 FlexDihedrals

n = 161 r² = 0.80 q² = 0.79 SE = 0.523 F = 155.22 P < 1.0 10-8 PC = 44.319

where:

HLB_G

= hydrophilic-lipophylic balance (HLB) calculated with Griffin's method.

WARNING:
These two equations are valid only for neutral non-ionic molecules.

Mopac.r

It runs multiple Mopac jobs.

XLOGP2.c

Calculate the logP by XLOGP V2 method. The result is shown in VEGA ZZ console and copied to the clipboard.
This script requires X-Score 1.3 for Windows that is not included in VEGA ZZ package. To install X-Socre, read the X-Score script manual.

For more information about X-Score and XLOGP, visit http://www.sioc-ccbg.ac.cn/

13.3.7 Color

Scripts to color the molecule:

Color RasMol.c	It colors the molecule in the current workspace according to the RasMol color scheme.

Color VMD.c	It colors the molecule in the current workspace according the VMD color scheme.

13.3.8 Common

This directory contains the initialization scripts to include in REBOL scripts:

Fmod.r	Fmod commands.

Formats.r	File format keywords and other definitions.

Utils.r	Functions for path manipulation.

Vega.r	VEGA ZZ interface (don't change it without any real reason !).

Vutils.r	REBOL/View utilities.

The C header files contained in this directory are hidden and they can't changed directly by VEGA ZZ environment.

13.3.9 Communication

This directory includes communication and Internet-related scripts:

Download molecule from URL.c	This script download a molecule from a given URL.

E-mail PDB send.c	This script saves the current molecule in PDB format, compress it and attach it to a user-editable e-mail. This script uses the MAPI layer and so it's compatible with MAPI compliant e-mail clients only (e.g. Outlook, Outlook Express, etc). To change the output format or other settings, see the script source code.

13.3.10 Database

This directory includes scripts to manage databases:

Count functional groups.c	This script counts the functional groups for each molecule in a database. The functional groups are recognized by Kier-Hall SMARTS template, but you can use also ATDL templates such as GROUPS, GROUPS_EXT, TRIPOS, etc. You can change the template by editing VGS_DEF_TEMPLATE constant in the code. If the input database supports SQL, you can decide to save the data in the same database or to a separated CSV file. These data are useful for QSAR analysis in which you need to recognize and count the functional groups.

Database expander.r	It's a REBOL/View script to extract the molecules contained in a database to a directory. It allows to specify the file format, the compression and the save attributes (connectivity and constraints).

Database logP.c	It calculates the logP by Testa's MLP method for each molecule in the database and export the results in a CSV (Output file) file. The input must be a supported database (Input database) and its structures can be pre-processed adding the hydrogens (Add the hydrogens) applying the geometry method (default) or the bond order method (Use bond order). This last method is recommended if the molecules have an assigned bond order. In the pre-processing phase, the structures can be optimized by the steepest descend (Steepest minimization) and/or the conjugate gradients (Conjugate minimization) methods. For both minimization algorithm, it's possible to put the number of iterations (Steps), the toler value (Toler) and the dielectric constant (Dielectric). Checking Update the graphic, the 3D graphic output is updated every 20 minimization steps. Increasing the Dot density value, it's possible to make a better prediction of the logP. A good value is from 10 to 50 dots for Å². Warning: even if in the theory it's possible to manage a 2D database, adding the hydrogens by the bond order method and optimizing the structures, this procedure is not recommended because the distance geometry optimization is not performed. For this reason, a better choice is the conversion of the database from 2D to 3D (see the Database 2D to 3D.c script) and the resulting database can be used directly to predict the logP values.

Database volume.c	It calculates the volume of each molecule in the database. It have the same options of the Database logP.c script.

Database to 0D.c	This script converts a 2D or 3D database to a 0D SDF database, translating all atoms at the specified coordinates, usually at (0, 0, 0).

DrugBank SDF fix.c	The DrugBank SDF files aren't standard, because the header of each reacord has two lines only instead of three and the first line contains a tab character to delimit the molecule name from the DrugBank ID. This script create a new file adding _fix.sdf suffix to the file name and fixing the files adding the missing line, removig the tab character and "SDF file of " string in the molecule name line.

Force field check.c	This script assigns the force field to each molecule in the database and checks if it is correctly assigned. An output file in the same directory of the database file is created and named as the database followed by - force field check.txt suffix. This script is useful to check if there are problems in atom type assignment before to run a virtual screening calculation.

Mol2 merge.c	It joins two or more databases in Mol2 format into a new file. This script doesn't perform any change to the data and therefore it's extremely fast.

SDF merge.c	It joins two or more databases in SDF format into a new file. This script doesn't perform any change to the data and therefore it's extremely fast.

SDF metadata extractor.c	This script extracts the metadata (e.g. InChi, SMILES, biological activity, etc) from a SDF file and puts it into a Comma Separated Values (CSV) file. The output file is placed in the same directory of the source database and its name is generated from it adding _meta.csv suffix.

SMILES to database.c	This script converts the SMILES molecules of a CSV file to 3D and puts them in a database. The CSV file must have two fields for each line separated by a semicolon (;): the former must be the molecule name and the latter must be the SMILES string.

Splitter.c	This script splits a database into more than one file, that can be useful to distrubute calculations on different PCs. The Input database must be in one of the formats supported by VEGA ZZ.

Subset creator.c	It creates a new database in SQLite format, including a subset of molecules of another database. The molecules must be specified in a text file in which molecule names (not ID) are placed one for each line. The subset database is created in the same directory of the source preserving its name as prefix and adding _subset.db suffix. A log file is generated also in which possible problems are reported. This script was specially developed to prepare input databases for virtual screening studies.

ZINC get by ID.c	This script downloads a structure from ZINC database to the current workspace by specifying the molecule ID. If the code is wrong or the entry doesn't exist, an error message is shown.

13.3.11 Development tools

Scripts for development.

Decision tree to C converter.c

This program converts the machine learning models, in particular the classification trees, generated by Weka program to C source code and requires no or very limited modifications to be used. It is the conversion of Tree2C command line program to a VEGA VLL extension.

Weka model preparation - Mini how to
This part of the manual don't want to be exhaustive and more information can be found in Weka manual and tutorials.

Start Weka and choose Explorer as application.
In Process tab, click Open file... and select the ARFF input file. This file could require to be pre-processed to be usable (Use the Edit button).
Go to Classify tab and choose the classifier (press Choose button in Classifier box). For example, select RandomForrest in trees.
In the options of the classifier (click on the classifier parameters of Classifier box), set printClassifiers option to True to generate the right output with the tree models.
Press Start button to generate the model.
If the model is acceptable, save it by clicking with the right mouse button on Result list and choose Save result buffer. Put the file name adding .txt extension and press Save.

Decision tree conversion to C

Run this program and select the output text file in Weka tree model.
Optionally, you can choose also the ARFF file used to generate the model. Its data will be used to generate the code to check if the calculated attributes are include in the classification domain defined as range of the properties used to build the model.
Set the output file name (Output C file).
You can specify the name of the model if the proposed one is not ok (Model name field).
Optionally, you can specify the labels printed as output for each class (Class labels field). The labels must be comma separated and their number must be the same of number of classes detected by the model.
Select the source code type to generate. More in detail you can choose:
- VEGA ZZ C-script: create a C-script for VEGA including the code to calculate the known molecular properties.
- C source + header: generic code for classification. You can use it also not in VEGA.
- Header only: same of above with the difference that all code is written in the header file.
If you are building a VEGA ZZ C-script and you want to install it into VEGA ZZ environment, you can check Install VEGA ZZ C-Script and put the installation directory in Script directory field. If you leave this field empty, the C-Script will be installed in home directory. You can use the disk button to explore the directory tree and you must remember you cannot install scripts outside the directory tree of the scripts.
Finally, click the Convert button.

When you choose VEGA ZZ C-script, the attribute names are analyzed and if are calculable by VEGA ZZ the right code is automatically added to the output, otherwise a warning message is shown because the resulting code will be incomplete and requires further implementations by the user.

13.3.12 DNA tools

Scripts for the manipulation of the nucleic acids (DNA, RNA and PNA).

DNA to PNA.c	This tool converts the DNA to PNA, acting only on the selected atoms.

RNA to DNA.c	This tool converts the RNA to DNA single stranded, acting only on the selected atoms.

13.3.13 Docking

Scripts for molecular docking.

13.3.13.1 AutoDock

These scripts allow to prepare input files for AutoDock 4:

Box calc.c	It calculates the box dimensions and its center coordinates containing the active (visible) atoms and shows the results in the console. This script is useful to define a macromolecule region to dock ligands.

DLG to PDB multimodel.c	It converts an AutoDock 4 DLG output to a standard PDB multimodel file, keeping in the remarks the energy information. This conversion is not required by VEGA ZZ that read DLG files as trajectories, but is needed by programs that are unable to manage this kind of files.

Ki calculator.c	It evaluates the Ki and the interaction energy of a given ligand - receptor complex. This script is useful to recalculate the AutoDock 4 score after an energy minimization (e.g. performed by NAMD). This calculation requires at least two molecules in the workspace and atom constraints defining the region in which the AutoDock 4 grid maps will be calculated. The free atoms only are considered to define this region. If there are more than two molecules or the ligand is ambiguous, the script ask to specify the molecule ID of the ligand. The results are shown in the VEGA ZZ console and copied to the clipboard.

Ligand.c	By this script, you can prepare the current molecule to be used as ligand with AutoDock 4, performing these steps: If needed, adds the hydrogens by protein method. If required, assigns the atom charges. If the molecule has two dimensions only, the 2D to 3D conversion is performed as explained below: Sends the molecule to AMMP. Performs the Gauss-Siedel distance geometry optimization (15 steps). Performs the steepest descent energy minimization (50 steps, toler = 1). Performs the conjugate gradients energy minimization (3000 steps, toler = 0.01). Sends the resulting structure to VEGA ZZ. These steps are performed for both 2D and 3D structures: Fixes the atom types, applying the AutoDock force field. Removes the apolar hydrogens. Saves the molecule in PDBQT format.

Receptor.r	By this script, you can prepare the current molecule as receptor for AutoDock 4, performing these steps: If needed, adds the hydrogens by protein method. If required, assigns the atom charges. Fixes the atom types, applying the AutoDock force field. Removes the apolar hydrogens. Saves the molecule in PDBQT format. Runs AutoGrid4 to calculate the maps if the user confirms the operation. The pre-defined docking box is set to explore the entire receptor, but if you want explore a specific protein region, you must select the atoms defining that region before to run the script. The grid spacing is automatically adjusted if the number of grid points exceeds the 63 value because AutoGrid 4 and AutoDock 4 can't manage grid greater than 63x63x63 points.

13.3.13.2 PLANTS

These scripts are useful to manage PLANTS docking software.

Docking.c	This script performs a molecular docking or a virtual screening calculation by PLANTS software, that must be installed as explained in the manual an in the PLANTS node of the script tree. The receptor and the ligand must be in Sybyl Mol2 format and if you want to run a virtual screening the ligands must be included into a Mol2 database (Mol2 multimodel format). In the graphic interface, some parameters can be set: Receptor File name of the target macromolecule (receptor) in Mol2 format. Ligand File name of the ligand to dock in Mol2 format. Output directory Directory in which the output files will be created. This field is automatically completed by selecting receptor and ligand. Flex. residues List of residues (space separated, in ResNum format) whose side chain will be considered flexible. By clicking the Get button, the field is automatically filled the residues that are active in the current workspace. Center X, Y, Z coordinates of the binding site center. Radius (Å) Radius in Å of the sphere including the binding site atoms. Clicking Calc. button, Center and Radius fields are automatically filled considering as binding site the visible atoms of the molecule shown in the current workspace. Clusters Number of solution clusters. RMSD Root Mean Square Deviation for the cluster analysis. Multimodel output Checking this gadget, it's possible to save all solutions in a whole multimodel file in Mol2 format. Atom scoring This checkbox allows the scoring values of each atom to be saved in the Mol2 output. The atomic charges are replaced by scoring values. Rigid ligand The ligand is kept rigid. Shape constraints In these fields, you can specify the molecule and the weight that is used for the volume overlap calculation (the more ligand atoms overlap, the better). For more information, read the PLANTS manual. Score Scoring function (chemplp, plp and plp95). Search Search mode: speed1 (highest reliability, slowest settings), speed2 (good reliability, twice as fast as speed1) and speed4 (modest reliability, four time fast as speed1). By clicking Run button, the calculation starts and a window is shown in which it's possible to stop the run by clicking Abort button. WARNING: if you close VEGA ZZ, the PLANTS calculation is not stopped, but when it finishes, the scripts doesn't convert the output files to be read directly by Microsoft Excel. For more information about PLANTS, visit http://www.tcd.uni-konstanz.de/ PLANTS installation: Complete the registration form in download page at http://www.tcd.uni-konstanz.de/ Download the PLANTS Win32 (minGW). Rename the file name to Plants.exe and copy it to ...\VEGA ZZ\Bin\Win32 directory, where ...\VEGA ZZ is the VEGA installation directory. Download mingwm10.dll and copy it to ...\VEGA ZZ\Bin\Win32 directory. If you installed the 1.1 version built by Mingw32, it's strongly recommended to patch it by running Patch bin 1.1 script.
Patch bin 1.1.c	This script applies a patch to PLANTS 1.1 binary (Mingw32 version) in order to fix S.O and S.O2 atom types that are defined in wrong way as S.o and S.o2. A backup copy of the original version of PLANTS is made in ...\VEGA ZZ\Bin\Win32 directory (Plants.bak). WARNING: To run this script, you need the administrative rights, otherwiese it will be impossible to patch PLANTS. If User Account Control (UAC) is enabled, you must run VEGA ZZ as administrator. To do it, click the VEGA ZZ icon on the desktop with the right mouse button and select Run as administrator.
Receptor.c	This script saves the receptor in the current workspace to be used in PLANTS calculations. In particular, it marks the backbone atoms and bonds by BACKBONE label that is required to consider the flexibility of the receptor side chains during the docking. WARNING: If you don't need to consider the receptor flexibility, you can save a normal Sybyl Mol2 file from VEGA ZZ main menu.
Rescore ChemPlp.c Rescore Plp.c Rescore Plp95.c	It evaluates the ligand - receptor interaction energy by ChemPlp, Plp and Plp95 scoring functions implemented in PLANTS. This calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand. The results are shown in VEGA ZZ console and copied to the clipboard. This script requires PLANTS for Windows that is not included in VEGA ZZ package.
RMSD calc.c	This script calculates the root mean square deviation (RMSD) of a given set of poses obtained by a docking calculation. As reference structure, the first pose of each ligand is considered that, in case of PLANTS, is the best ranked. The script calculates also the RMSD (ALNRMSD) aligning each pose to the reference one (this is useful to evaluate the conformational changes between the poses) and the mean values of both type of RMSDs. You must specify only the database including the docking poses (without the target/protein) and the name of the output CSV file. You can give also a database as input including both receptors and ligands. The script try to detect automatically the ligand and when it's not possible, a requester is shown.

13.3.13.3 Vina

These scripts allow to prepare input files and to run AutoDock Vina:

Docking.c	This script performs a molecular docking calculation using AutoDock Vina. The receptor and the ligand files must be in PDBQT format and can be prepared by Receptor.c and Ligand.c scripts. In the graphic interface of this script, you can specify the following parameters: Receptor File name of the target macromolecule (receptor) in PDBQT format. Ligand File name of the ligand to dock in PDBQT format. Output model File name of the ligand poses in PDBQT multimodel format. Remember that this file doesn't include the receptor structure. Log file Vina log file name. Center X, Y, Z coordinates of the binding site center. Size (Å) Dimensions in Å of the cube including the binding site. Exahaustiveness Exhaustiveness of the global search (roughly proportional to time). Binding modes Maximum number of binding modes to generate. For more information about AutoDock Vina, click here.

Ligand.c	It prepares and saves the current molecule as ligand for Vina, performing these steps: If needed, adds the hydrogens by protein method. If required, assigns the atom charges. If the molecule has two dimensions only, the 2D to 3D conversion is performed as explained below: Sends the molecule to AMMP. Performs the gauss-Siedel distance geometry optimization (15 steps). Performs the steepest descent energy minimization (50 steps, toler = 1). Performs the conjugate gradients energy minimization (3000 steps, toler = 0.01). Sends the resulting structure to VEGA ZZ. These steps are performed for both 2D and 3D structures: Fixes the atom types, applying the Vina force field. Removes the apolar hydrogens. Saves the molecule in PDBQT format.

Receptor.c	It prepares and saves the molecule in the current workspace as receptor for Vina, performing these steps: If needed, adds the hydrogens by protein method. If required, assigns the atom charges. Fixes the atom types, applying the Vina force field. Removes the apolar hydrogens. Saves the molecule in PDBQT format.

Virtual screening.c	This script performs structure-based virtual screenings by AutoDock Vina. To do them, you need: the receptor structure in PDBQT format. You can prepare it from any type of file using Receptor.c script; the database containing the ligands to screen. It must be in any format supported by VEGA ZZ (Microsoft Access, Merck MMD, Mol2 multimodel, ODBC data source, SDF file, SQLite and Zip archive). The database don't require to be prepared before the screening, because the script has the capability to detect the missing features and to fix them. In particular, it can add hydrogens using the best strategy, fix the atomic charges and to convert structures from 2D to 3D. The graphic user interface of this script allows to setup the screening in easy way, changing the following parameters: Receptor File name of the target macromolecule (receptor) in PDBQT format. Energies Output file in localized CSV format containing the energy of the best pose for each ligand. The first column is the molecule progressive number (MolID), the second one is the molecule name (Name) and the third one is the Vina energy of the best pose (Energy). Ligand database Database of the ligands to screen. Output models File name of the Zip archive in which the poses in PDBQT multimodel format are stored. The script add a numerical suffix to file name that is incremented automatically every time in which the file size exceeds the limit of 2 Gb. Log file Log file name. Center X, Y, Z coordinates of the binding site center. Size (Å) Dimensions in Å of the cube including the binding site. Exhaustiveness Exhaustiveness of the global search (roughly proportional to time, default 8). Binding modes Maximum number of binding modes to generate (default 1). Clicking Calc button, Center and Size fields are automatically completed using the atoms selected in the current workspace that will be considered as binding site. Clicking Save cfg, you can save the current configuration that can be restored clicking Load cfg. The resulting .vcf file is not compatible with Vina, while that generated by Docking.c maintains the compatibility (see --config option of Vina). About the restart The restart procedure is automatically performed if the energy CSV file is found. You can choose to restart the calculation or to run it from the beginning by a requester window. For more information about AutoDock Vina, click here.

If you want to run a Vina docking calculation, follow these steps:

Open the ligand in VEGA ZZ.
Run Ligand.c script, saving it.
Open the receptor in another workspace.
Run Receptor.c script, saving it in the same directory of the ligand.
Run Docking.c, put the ligand and the receptor file names. Log file and Solutions fields are automatically completed.
In VEGA ZZ, select the atoms defining the binding pocket to dock the ligand. The proximity method of the custom selection tool can help you.
In Vina docking window, click Calc. to fill automatically Center and Dimensions fields.
If you want, change the default docking parameters. For more information, read the Vina manual.
Press Run button to start the docking.

13.3.13.4 Other docking scripts

Here are other scripts for generic analysis.

APBS binding energy.c	This script evaluates the binding energy of a given ligand - receptor complex. This calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand. The results are shown in VEGA ZZ console and copied to the clipboard. This script uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst. For more information about APBS, visit http://www.poissonboltzmann.org/apbs/

Best score of isomers.c	This script was developed with the aim to manage the docking results obtained when the database of ligands was expanded with stereoisomers, geometric isomers and tautomers of each molecule. It chooses the best isomer of a molecule on the basis of the best (lowest) docking score. When you run the script, you must put the input file in CSV format including the data (molecule name, scores etc) of all docked species, the output CSV file, the column with the ligand names and the column of the score. The isomers are detected by name: they must share the same prefix followed by the underscore character ("_").

Contact surface.c	This script measures the ligand/receptor contact surface (shared surface) in a complex. The results are automatically copied to the clipboard and are: contact surface, percentage of contact surface referred respectively to the ligand, receptor and complex surfaces. All data are expressed in Å².

Fred2 scrore.c	It calculates the interaction score of a ligand - protein complex using OpenEye's Fred2 docking software. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. The scores extracted from Fred's outputs are: Chemgauss2, Chemscore, Plp, Screenscore, Shapegauus and Zapbind. The results are automatically copied to the clipboard. Warning: This script requires Fred2 installed on your PC. You can request/buy it at http://www.eyesopen.com/

GOLD score extractor.c	This script extracts the docking scores of each pose stored in the mol2 file generated by GOLD and saves them into a csv file. The output file is created in the same directory of the mol2 one and is named as XXX_GOLD.csv, where XXX is the name of the source file.

Hypervolume analyzer.c	This script calculates the shared area (hyperarea) and the shared volume of a set of multiple poses (hypervolume) obtained by a docking calculation. As input, you must specify the database with the docking poses in one of the formats supported by VEGA ZZ, while the output CSV file is saved in the same directory of the database with the name DATABASE_PREFIX - HyperVol.csv. In the output file, you can find the following columns: Name: Name of the ligand; Poses: Number of poses; Area: Area of the first pose; HyperArea: Shared area of the set of docking poses; DeltaArea: HyperArea - Area; RatioArea: HyperArea / Area. Volume: Volume of the ligand; HyperVolume: Shared volume of the set of docking poses; DeltaVolume: HyperVolume - Volume; RatioVolume: HyperVolume / Volume. The multiple poses of the same molecule are detected by their names: they must share the same prefix followed by the underscore character ("_").

Mean score of multiple poses.c	This script calculates mean, minimum, maximum, range and standard deviation of docking scores for all poses of each ligand. When you run the script, you must put the input file in CSV format including the data (molecule name, scores etc) of all docked species, the output CSV file, the column with the ligand names and the column of the score. The ligands are detected by name: they must share the same prefix followed by the underscore character ("_").

Mopac binding enthalpy.c	This script evaluates the binding enthalpy with MOPAC: the calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand; the receptor is simplified keeping only the residues included in a spheroid of 3 Å around the ligand. The user can change this value in the script (VS_PROXIMITY constant); the complex geometry is optimized until the termination criteria GNORM is achieved. By default, this value is set to 10 (see VS_MOPACKEYS_MIN constant in the script); the heat of formation is evaluated for both ligand and receptor separated and complexed; the binding enthalpy is obtained by subtracting the two energies; the results are shown in VEGA ZZ console and copied to the clipboard, if requested by the user. This script requires at least Mopac 2012 for Windows that is not included in VEGA ZZ package. For more information, see Installation of optional components.

Rescore+.c	This script recalculates the interaction scores between a given set of ligand poses in a database and a target molecule or, alternatively, between a ligand and a receptor both included in a trajectory file. To run the calculation, you must specify the Receptor file name, the Database including the docked ligands, the CSV output file to store the scores, the Log file in which are written the errors and finally one or more scoring functions. For more information about the scoring functions, you can consult the VEGA ZZ manual. WARNING: the database must contain ligand poses obtained by a previous docking calculation. This script doesn't perform any kind of docking calculation. To calculate the RPScore, the ligand must be a peptide/protein with the residue names indicated in the sequence.

RPScore.c	It calculates the RPScore of a given protein-protein complex or a trajectory of protein-protein complexes. In this second case, the results are saved to a CSV file. The complex or the trajectory must open in the current workspace. This script is the VEGA ZZ implementation of the well known RPScore program. For more details: http://www.sbg.bio.ic.ac.uk/docking/rpscore.html Gidon Moont, Henry A. Gabb, and Michael J.E. Sternber, "Use of Pair Potentials Across Protein Interfaces in Screening Predicted Docked Complexes", PROTEINS: Structure, Function, and Genetics 35:364-373 (1999).

WarpEngine GRAMM extractor.c	This script extracts one or more complexes from the output generated by GRAMM docking software used in WarpEngine parallel execution environment. To complete the extraction, the script asks you: the database containing the ligands that were docked; the receptor file in any format supported by VEGA ZZ; the WarpEngine GRAMM output file (it have the .csv file extension); the output directory; the numbero of top ranked complexes that you want to extract. By default, the script saves the complexes in IFF format and assigns CHARMM force field and Gasteiger-Marsili atom charges. These default parameters can be changed by editing the script code.

X-Score.c	It evaluates the interaction score of a given ligand - receptor complex. This calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand. The results are shown in the VEGA ZZ console and copied to the clipboard. This script requires X-Score 1.2 or 1.3 for Windows that is not included in VEGA ZZ package. For more information about X-Score, visit http://www.sioc-ccbg.ac.cn/ To install X-Score package in VEGA ZZ enviroment: Open the following Web site: http://www.sioc-ccbg.ac.cn/?p=42&software=xscore Complete the on-line registration form. Log-in with your credential and download X-Score package for Windows platform. Open the tar file by WinRAR or other suitable software able to unpack tar gizipped files. Extract xscore_win32.exe from xscore_win32\bin to ...\VEGA ZZ\Bin\Win32, where ...\VEGA ZZ is the VEGA ZZ installation path (usually C:\Program Files\VEGA ZZ). Rename xscore_win32.exe to xscore.exe. Extract parameter directory from xscore_win32 to ...\VEGA ZZ\Data directory. This last directory is hard to identify, because every Windows version creates it in a different place. To find it, open VEGA console from Start menu and type OpenDataDir. Rename parameter to Xscore. Now, you are ready to use X-Score script. If you want to use xscore.exe from command prompt, open VEGA console and use xs command, that is a shell script that fixes the environment variable required by X-Score.

13.3.14 Examples

This directory includes the example scripts:

Benzene	In this folder, you can find several examples showing you how to build a benzene ring using different scripting languages (C-Script, JavaScript, PHP, Python, REBOL).

HyperDrive	This folder includes C-scripts showing you how to use HyperDrive APIs.

Log kw IAM	In this folder, there are minimalist codes in different scripting languages to calculate log k_w^IAM.DD2 and log k_w^IAM.MG of the molecule in the current workspace.

Command console.htm	This script demonstrates how it's possible to control VEGA ZZ by JavaScripts in a HTML page.

Demo.bat	Demo script.

Demo.r	The same of the above, but written in REBOL.

Distances.r	This REBOL script explains how to measure interatomic distances.

Graph.r	Demo of the extended commands to manage the plots.

GraphApp demo.r	Demo of the GraphApp GUI library.

Info.r	It shows some information in the VEGA ZZ console.

Meshload.r	Il loads and shows a 3D rabbit mesh model.

Mini-XML demo.c	Demo script of Mini-XML library.

MP3 player.r	Minimalist mp3 player (fmod demo).

NAMD minimization.c	This script shows how to use the NAMD helper to perform an energy minimization by NAMD 2. It requires only a molecule in the current workspace.

REBOL View\VEGA ZZ toolbar.r	It shows a REBOL/View toolbar to control the VEGA ZZ main features.

Requesters.r	Simple demo of the VEGA ZZ built-in requesters.

VEGA GL.c	Application example of VEGA GL commands.

13.3.15 File conversion

This directory includes scripts for file format conversion :

CSSR SOMFA export.c	This script exports the current molecule in CSSR format readable by SOMFA.

CSV export.c	It saves the current molecule in Comma Separated Values (CSV) format.

Format conversion.r	This script performs the batch file format conversion of all molecules contained in a folder. Some parameters can be changed in the dialog window: Source dir. Name of the source directory in which the converting files are placed. Click Open button to show the directory requester. Destination dir. Name of the destination directory in which all converted files will be inserted. Click Open button to show the directory requester. Output format Use this list to select the output format. Compression Compression method (default none). Add hydrogens - None No hydrogens will be added. - Generic Generic organic geometry-based method. - Generic BO Generic organic bond order-based method. - Nucleic acid Nucleic acid geometry-based method. - Nuc. acid BO Nucleic acid bond order-based method. - Protein Protein geometry-based method. - Protein BO Protein bond order-based method. Include the connectivity If checked, the atom connectivity is included (if the file format supports it). Include the atom constraints If checked, the atom constraints are saved into the file (if the file format supports it). Normalize the coordinates If checked, the molecule is translated at the axis origin (0, 0, 0). Assign the Gasteiger/Marsili charges If checked, the Gasteiger - Marsili atom charges are assigned. Clicking Convert button, the conversion starts and clicking Close the dialog window is closed.

PDB ren export.c	It exports the molecule in PDB format renumbering the atoms.

XYZ import.c	It imports XYZ files giving the possibility to adapt the filter to each sub-format.

13.3.16 Interaction surface

These scripts calculate and manage ligand-receptor interaction surfaces.

CHARMM interaction surface.c	It calculates the CHARMM non-bond interaction energy of each ligand-receptor atom pair and project it on the Van der Waals surface. You must enter the molecule ID/number to indicate the ligand.

Lipophilic interaction surface.c	It calculates the lipophilic interaction of each ligand-receptor atom pair and project it on the Van der Waals surface. You must enter the molecule ID/number to indicate the ligand.

MEP interaction surface.c	It calculates the electrostatic interaction energy of each ligand-receptor atom pair and project it on the Van der Waals surface. You must enter the molecule ID/number to indicate the ligand.

MLPInS color ramp.c	This script normalizes the color ramp calculated by MLPInS interaction surface script, using the user-defined range of values. The normalization is useful to compare surfaces of different molecules using the same color scheme. It recognizes MLPInS surfaces only and changes them selectively.

MLPInS interaction surface.c	It calculates the MLP Interaction Score (MLPInS) of each ligand-receptor atom pair and project it on the Van der Waals surface. The user must enter the molecule ID/number to indicate the ligand.

13.3.17 Movie

Scripts to create movies.

Movie maker.c	This script generates a movie file starting from the molecule in the current workspace, rotating it around one or more axis. The parameters that the user can change are: Output movie (file name of the output movie), Number of frames (number of frames to put in the trajectory),Preview (checking this gadget, the animation is shown in the main window not saving the output movie), X rotation (rotation in degrees around the X axis), Y rotation (rotation in degrees around the Y axis) and Z rotation (rotation in degrees around the Z axis). Clicking Animate, the movie will be created. The codec requester is shown to select the required compression options. Take care choosing the Render mode because not all graphic cards supports the Hardware mode. The Software rendering is the most reliable even if it's unable to reach the Hardware quality.

Sec. structure anim.c	This script generates a movie file starting from the peptide in the current workspace, changing the secondary structure. The parameters that you can change are: Output movie (File name of the output movie), Number of frames (number of frames to put in the animation), Preview (checking this gadget, the animation is shown in the main window not saving the output movie), Start Phi (starting value of the Phi dihedral angle), Start Psi (starting value of the Psi dihedral angle, Start Omega (starting value the Omega dihedral angle), End Phi (ending value of the Phi dihedral angle), End Psi (ending value of the Psi dihedral angle), End Omega (ending value of the Omega dihedral angle). Click Animate to create the movie file. The codec requester is shown to select the required compression options. Take care choosing the Render mode because not all graphic cards supports the Hardware mode. The Software rendering is the most reliable even if it's unable to reach the Hardware quality. For the most common Phi, Psi and Omega values, click here.

13.3.18 Protein tools

This directory includes the visualization scripts:

Aminoacid selector.r	It shows the amino acid by selection and/or by chemical/physical properties.

Dump backbone torsions.c	It dumps the phi and psi backbone torsions of a protein.

Fasta to text.r	It convert a Fasta into a text file. That's is useful to load it into Microsoft Excel.

HIS protonantion.c	It finds the histidine protonantion state (on NE2 or on ND1) using the CHARMM potential and swap the hydrogens (e.g. H-NE2 to H-ND1) according to the hydrogen bond energy. If the energy difference between the H-NE2 and H-ND1 tautomers is more than 2.0 Kcal/mol the hydrogen is placed on the nitrogen realizing a structure with lower hydrogen bonding energy. The starting structure must have the hydrogens.

Move hydrogens to end.c	This script moves the hydrogen atoms to the end of the atom list. In this way, you can obtain files split in two parts: the first one containing the heavy atoms and the second one, placed at the end, containing the hydrogens. As an example, that's useful to write mol2 files compatible with GOLD docking system.

Score.c	It calculates the interaction score between a ligand and a generic target biomacromolecule. The ligand must be previously docked in the target structure. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. The script can calculate: Electrostatic energy (Coulomb). Electrostatic energy with distant-dependent dielectric constant. R6-R12 Lennard-Johnes non-bond energy using the CHARMM and CVFF force fields. Hydrophobic interaction using the Broto-Moreau parameters with different distance functions (linear, square, cube and Ferm's function). The results are automatically copied to the clipboard.

13.3.18.1 Homology modelling services

This folder includes on-line services for homology modelling.

FUGUE.htm	FUGUE is a program for recognizing distant homologues by sequence-structure comparison. It utilizes environment-specific substitution tables and structure-dependent gap penalties, where scores for amino acid matching and insertions/deletions are evaluated depending on the local environment of each amino acid residue in a known structure. Given a query sequence (or a sequence alignment), FUGUE scans a database of structural profiles, calculates the sequence-structure compatibility scores and produces a list of potential homologues and alignments. For more information, visit this Web site: http://tardis.nibio.go.jp/fugue/

I-TASSER.htm	I-TASSER server is an Internet service for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER assembly simulations; function inslights are then derived by matching the predicted models with protein function databases. I-TASSER (as 'Zhang-Server') was ranked as the No 1 server for protein structure prediction in recent CASP7, CASP8 and CASP9 experiments. It was also ranked as the best for function prediction in CASP9. The server is in active development with the goal to provide the most accurate structural and function predictions using state-of-the-art algorithms.

Phyre 2.htm	Protein Homology/analogY Recognition Engine.

ROBETTA.htm	Robetta provides both ab initio and comparative models of protein domains. It uses the ROSETTA fragment insertion method (Simons et al. (1997) J Mol Biol. 268:209-225). Domains without a detectable PDB homolog are modeled with the Rosetta de novo protocol (Bonneau et al. (2002) J Mol Biol. 322:65-78). Comparative models are built from Parent PDBs detected by UW-PDB-BLAST or HHSEARCH and aligned by various methods which include HHSEARCH, Compass, and Promals. Loop regions are assembled from fragments and optimized to fit the aligned template structure (Rohl et al. (2004) Proteins 55:656-677). The procedure is fully automated. For more information, visit this Web site: http://robetta.bakerlab.org/

SWISS-MODEL.htm	SWISS-MODEL is a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make Protein Modelling accessible to all biochemists and molecular biologists worldwide. For more information about the service, visit: http://swissmodel.expasy.org/

13.3.19 PubChem

PubChem-related scripts. They requires an Internet connection.

13.3.19.1 PubChem database rename

Scripts to rename the molecules in a database.

By CID.c	This script allows to rename all molecules in a database according to CID code. A log file containing the errors is automatically created in the database directory by adding "- rename.log" as suffix to the database file name.

By IUPAC.c	This script allows to rename all molecules in a database according to IUPAC name. A log file containing the errors is automatically created in the database directory by adding "- rename.log" as suffix to the database file name.

By name.c	This script allows to rename all molecules in a database according to the most common name in PubChem. A log file containing the errors is automatically created in the database directory by adding "- rename.log" as suffix to the database file name.

13.3.19.2 PubChem download

Multiple by CID.c	This script downloads multiple molecules to a directory by specifying their CID in a CSV file (with semicolon separated fields). This file must contain the first line with the labels, the first column with CIDs and, optionally, a second column with the molecule names that are used for the files. The molecules are downloaded in 3D SDF format and if an error occurs, it is reported in the log file that has the same prefix of CSV one and " - download.log" as suffix. CSV file example with CIDs only: CID 10075246 10110916 10111186 10114637 CSV file example with CIDs and names: CID;Name 10075246;"Mol 1" 10110916;"Mol 2" 10111186;"Mol 3" 10114637;"Mol 4"

Multiple by name.c	This script downloads multiple molecules to a directory by specifying their name in a text file. This file must contain the name of the molecules to download one for each line. The molecules are downloaded in 3D SDF format and if an error occurs, it is reported in the log file that has the same prefix of the input one and " - download.log" as suffix. Text file example: Ethanol Benzene Aspirin Phenol

Single by CID.c	It downloads a structure from PubChem to the current workspace by specifying the CID code. If the code is wrong, an error message is shown.

Single by name.c	It downloads a structure from PubChem to the current workspace by specifying its name. If the molecule is not available, an error message is shown.

13.3.19.2 PubChem get

CID.c	This script asks PubChem for the CID code of the molecule in the current workspace. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.

IUPAC name.c	This script asks PubChem for the IUPAC name of the molecule in the current workspace. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.

Name.c	This script asks PubChem for the name of the molecule in the current workspace. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.

Multiple IUPAC names.c	This script asks PubChem for the IUPAC names of the molecules by specifying their CID in a CSV/text file. The first line of this file can be the column label (not mandatory). The IUPAC names are stored in a CSV file that can be specified by the user. Input file example: CID 243 3339 128563 3236 Output file example: CID;IUPAC 243;"benzoic acid" 3339;"propan-2-yl 2-[4-(4-chlorobenzoyl)phenoxy]-2-methylpropanoate" 128563;"methyl (2S,4aR,6aR,7R,9S,10aS,10bR)-9-acetyloxy-2-(furan-3-yl)-6a,10b-dimethyl-4,10-dioxo-2,4a,5,6,7,8,9,10a-octahydro-1H-benzo[f]isochromene-7-carboxylate" 3236;"1-(4-ethylphenyl)-2-methyl-3-piperidin-1-ylpropan-1-one"

XLogP.c	This script gets the XLogP name of the molecule in the current workspace from PubChem. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.

13.3.20 QSAR

Scripts for QSAR.

Data normalizer.c	The script normalizes the values of the specified columns in 0-1 range of a given spreadsheet in CSV format, assuming that the first row is the header of each column. The output spreadsheet is saved using "- normalized.csv" extesion to the file name.

Principal component analysis.c	This scripts performs the Principal Component Analysis of a given dataset in CSV format. You can chose the columns to include in the matrix to be analyzed. The script saves two files: the first one includes statistical data for each selected column such as the mean of the values, their standard deviations and the PCA results such as the eigenvalues, the eigenvectors and the coefficients to project the data in the PCA space, whose values are in the second file. The PCA calculation is done only for the first three principal components.

Table join.c	This script joins two or more tables in CSV format. That's useful when the number of colums/rows is too large to be managed by Microsoft Excel. You can select an unlimited number of tables/spreadsheets and you can specify individually the join position (Bottom or Right). The output file is automatically saved when you stop to add other spreadsheets clicking Cancel in the file requester and its name is obtained from the first file by adding - join.csv extension.

Training and test set creator.c	This script helps the user to create a random training and test sets from a given data set in CSV format. This is useful to validate a QSAR model, by calculating the linear regression of the training set and using the test set to predict the dependent variable. You can create homogeneus sets (the script ask you if you want that) in terms of mean and standard deviation. You can select the properties that you want to keep homogenus in both sets. The script standardizes the data and performs several trials to split randomly the two sets. When the differences between the traing and test set of means of the means and the means of the standard deviations of the properties is less than a user-defined value (0.01) the iterative process is stopped. This script writes two CSV files as output for the training set and for the test set, respectively adding to the file name "- training" and "- test".

13.3.20.1 Linear regression

Scripts for linear regression.

Automatic linear regression.c	This script generates automatically all possible multiple regression models by these steps: Selection of the best independent variables by calculating the correspondent equation with a single regressor. Regressions with R² value less than 0.10 determine automatically the exclusion of the independent variable. If the number of found variables is less than the maximum number of regressors, the "desperate mode" is automatically enabled and 50% of the best variables are selected. Identification of collinear independent variables by calculating the Variance Inflation Factor (VIF) value for each regressor pair. Variable pairs with VIF > 5.0 are considered collinear and aren't not considered in the model calculation. Calculation of the models with a number of regressors from one to a user-defined value (default 3). For each model, a cross-validation procedure (leave-one-out) is performend and the prediction power is shown ad Q². If the number of observations is more than 200, the script asks to confirm the cross-validation. The script requires a CSV file as input, that can be exported from your preferred spreadsheet software (e.g. Microsoft Excel) and generates an output file with the same prefix of the input followed by - regression.txt as name. The output file includes some information as the best independent variables, the collinear variable pairs, all regression models and the best regression models (three for each number of regressors).

Linear regression.c	This script performs the multiple linear regression and requires a CSV file as input. In two steps, you can select the dependent variable (usually the activity) and the independent variables from the list built from the first row in the spreadsheet.

Model validator.c	This script allows the QSAR models to be validated by splitting randomly the whole dataset in a number of training and test set pairs. For each training set, the regression coefficients are calculated to evaluate the test set in terms of standard deviation of errors, angular coefficient, intercept and r² of the trend line of the chart of the predicted vs. experimental activities. To use this script, you must specify the file containing the data of the regression analysis that must be in CSV format and can be exported in easy way from your preferred spreadsheet. Thus, you must select the dependent variable (usually the activity) and the independent variables of the QSAR model that you have found previously for example by Automatic Linear Regression script. Finally, you must put the number of molecules of the training set and the number of random trials. At the end of the calculation, a CSV output file is written in the same directory of the data file by adding "- validation.csv" suffix to the original file name. This output can be opened by a spreadsheet and it includes columns as shown below: Trial Progressive number of the trial. Rsq Multiple correlation coefficient of the training set (r²). RsqAdj adjusted r² if the training set. PC Amemiya Prediction Criterion of the training set. P Probability of the training set. F Fisher F statistic for regression of the training set. StdDevOfErrs Standard deviation of errors (SE) of the training set. Test_MeanErr Mean error in prediction of the test set. Test_StdErrOfErrs Standard deviation of errors (SE) of the test set. Test_M Angular coefficient of the trend line of the chart of predicted vs. experimental activities. Test_B Intercept of the trend line of the chart of predicted vs. experimental activities. Test_Rsq r² of the chart of predicted vs. experimental activities. Test_PC Amemiya Prediction Criterion of the test set. Test_P Probability of the test set. Test_F Fisher F statistic for regression of the test set. Intercept Intercept of the regression equations. Coefficients of the independent variables The list of the coefficients for each regressor. The output file includes also the mean (Mean) and the standard deviation (StdDev) of the previous columns and the labels of the columns selected as dependent (DepVar) and independent (InDepVar) variables.

13.3.20.2 Virtual screening

Scripts for the analysis of virtual screening results.

CSV to SVM light.c	This script converts a standard CSV file to SVM Light format. It requires the molecule names as first column and an activity / dependent variable column that you can choose by a requester. Moreover, you can select also the dependent variables that are exported to the output file. For more information, read http://svmlight.joachims.org/.

Enrichment factor analysis.c	This script helps to to setup a virtual screening calculation by analyzing the enrichment factor that you can obtain by screenings on sets including true-active and decoy molecules. The data must be in CSV file format and you can select the activity and score columns. Moreover, you can also specify the activity threshold to indicate when a molecule must be considered active or not and the cluster size for the cluster analysis. The script sorts the rows in ascending order on the basis of the score/property used to predict the activity, thus performs the cluster analysis showing the results in a bar plot. If the score/property can successfully detect the active compounds, they must be ranked at the top of the sorted list populating the first clusters. The enrichment quality is evaluated in terms of skewness and kurtosis. In particular, a kurtosis value close to zero indicates a Gaussian distribution, otherwise an high value is synonym of an asymmetric curve. The aim of this kind of analysis is to obtain an highly asymmetric curve translated on the left of the plot and this result can be obtained when the kurtosis value is high. Just to give you an idea, kurtosis values less then 5 can be considered poor and, on the contrary, values greater than 5 are good.

Enrichment factor optimizer (manual).c	This script can be used to improve the enrichment factors of a virtual screening analysis. More in detail, it allows a new scoring function to be obtained, resulting from the linear combination of two or more user-defined descriptors such as docking scores and molecular properties. The coefficients of this first-degree equation are calculated by maximizing the number of the active compounds in the top of the list in which the molecules are ranked by the score calculated through the new equation. The maximization is performed by the gradient-free Hooke-Jeeves algorithm and, in order to avoid local maxima, a random sampling is also applied. As input, a CSV file is required, containing one activity and several score/properties columns that you must select. Moreover, you must also specify the activity threshold to indicate when a molecule must be considered active or not. The output is shown in the VEGA ZZ console as in the following example: File name.....................: bestranking.csv Activity column...............: ACTIVITY Activity range................: 0.00 - 1.00 Activity threshold............: 0.50 Number of molecules...........: 2513 Number of active molecules....: 38 Max. minimization steps.......: 5000 RMS to stop minimization......: 0.001 Random sampling steps.........: 36 Random selection probability..: 1.51 % Score = 1.0000 SCORE_0000 + 0.2309 SCORE_+000 - 0.5851 SCORE_0+00 Top % Mols Act Act % EF ================================= 1.00 25 4 16.00 10.58 2.00 50 6 12.00 7.94 5.00 125 13 10.40 6.88 10.00 251 18 7.17 4.74 20.00 502 24 4.78 3.16 The coefficients of the equation are divided by the coefficient of the first term.

Enrichment factor optimizer.c	The script uses the same approach of Enrichment factor optimizer (manual).c, introducing the automatic selection of the variables to obtain the best mathematic models in terms of enrichment factors. You can select the activity, the independent variables/molecular descriptors and scores to be combined to obtain the maximum enrichment factor. Although this script has a parallel design, it could require a long time to complete the calculation, especially when you select a large number of equation terms (more than three). You can also specify the threshold for the detection of active and inactive compounds, the number of variables used to build the models and the cluster size for the cluster analysis. The results are sorted from best to worst enrichment factor and saved in a CSV file that can be analyzed by your preferred spreadsheet. The output file (named prefix - model.csv) includes several columns: ModID = identification number of the model that are ranked by ModelScore (from the best to the worst); NV = number of variables/scores used to build the model; Active_N = number of active molecules in the first N percentile; ActivePerc_N = percentage of the active molecules in the first N percentile; EF_N = enrichment factor in the first N percentile; Kurtosis = kurtosis of the histogram profile obtained by cluster analysis; Skewness = skewness of the histogram profile obtained by cluster analysis; ModelScore = score of the model (larger = better); Model = equation of the model with calculated coefficients (if you selected more than one score); ScoreMin = minimum score evaluated by the model; ScoreMax = maximum score evaluated by the model; Clustrer_N = percentage of active molecule in each cluster. This script performs also the validation of the best models by building five pairs of training and external sets (with 70/30 % ratio) from the starting dataset. Training set is used to recalculate the models and external set to predict the activity. The results of this analysis are saved to prefix - valitadion.csv in which are present the same data as for the models obtained from the whole dataset with the exception of population of the clusters. The headers of the columns are named with ts and es prefix to identify respectively the training and the external sets.

13.3.21 Trajectory

It contains scripts for trajectory management.

Anim maker.c	This script generates a trajectory file starting from the molecule in the current workspace, rotating it around one or more axis. That's useful to create video files. The parameters that the user can change are: Output trajectory File name of the output trajectory. In the file requester, is it possible to select the output format (default Gromacs XTC). XTC comp. (1-6) Gromacs XTC compression ratio. It has a meaning only if the Gromacs XTC format is selected as output (default 3). Save the animation Check this gadget to save/render the animation (e.g. avi, mpeg, etc). Number of frames Number of frames to put in the trajectory (default 50). X rotation Rotation in degrees around the X axis (default 0). Negative values are allowed. Y rotation Rotation in degrees around the Y axis (default 360). Negative values are allowed. Z rotation Rotation in degrees around the Z axis (default 0). Negative values are allowed. Animate Push this button to create the animation trajectory.

APBS trajectory.c	This script calculates the solvation energy for each frame included in a MD trajectory and save the values in a CSV file. It uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst. For more information about APBS, visit http://www.poissonboltzmann.org/apbs/

Automatic quenching.r	This script extracts the frames from a trajectory file, then minimize them using AMMP or Mopac. The results will be stored to an output trajectory file. You can input some parameters: Input molecule File name of the input molecule. When you select a molecule using Open button, Input trajectory, Output trajectory and Output energy fields are automatically updated. Input trajectory File name of the input MD trajectory. When you select a new trajectory using Open button, Output trajectory and Output energy fields are automatically updated. Output trajectory File name of the output trajectory. When you select a new trajectory using Open button, Output energy field is automatically updated. In the file requester, you can select the output format. XTC comp. (1-6) Gromacs XTC compression ratio. It has a meaning only if Gromacs XTC format is selected as output. Output energy file File in which the energy values are stored (CSV format). It's available only if Mopac is selected as Minimization type. First frame Trajectory frame from which the quenching starts. Last Trajectory frame to which the quenching ends. Step Increment for the frame enumeration. Minimization type This field allows to select the minimization type: None (nothing is performed), AMMP (molecular mechanics method based on the conjugate gradients algorithms) and Mopac (semiempirical method). AMMP min. steps Number of minimization steps used by AMMP. AMMP toler I's the convergence criterion used by AMMP to stop the minimization. Mopac keywords In this field, you can put the keywords to control the Mopac calculation. Calculate Press this button to perform the quenching. If a parameter is incorrect or missing, an error message is shown.

DCD fix for VMD.c	All pre-3.0.0 VEGA ZZ releases write buggy DCD files that aren't readable by VMD. This scripts fix the problem patching the DCD trajectory only if the problem is detected.

Dump energy.c	This script calculates the energy for each MD frame and dumps the molecular mechanics energy components in a CSV file. It also performs a histogram analysis. Input molecule File name of the input molecule. When you select a molecule using Open button, Input trajectory, Output trajectory and Output energy fields are automatically updated. Input trajectory File name of the input MD trajectory. When you select a new trajectory using Open button, Output trajectory and Output energy fields are automatically updated. Output energy Output energy file in CSV format (Comma Separated Values). Each column contains the following data: frame number, bond, angle, torsion, hybrid, non-bond and total energies. Output histogram Output histogram in CSV format. First frame Trajectory frame from which the quenching starts. Last Trajectory frame to which the quenching ends. Step Increment for the frame enumeration. Minimization type This field allows to select the minimization type: None (nothing is performed), AMMP (molecular mechanics method based on the conjugate gradients algorithms) and Mopac (semiempirical method). AMMP min. steps Number of minimization steps used by AMMP. AMMP toler I's the convergence criterion used by AMMP to stop the minimization. Mopac keywords In this field, you can put the keywords to control the Mopac calculation. Calculate Press this button to perform the quenching. If a parameter is incorrect or missing, an error message is shown.

Enantiomerizer.r	It converts the trajectory to another format inverting all chiral atoms. You can specify the following parameters: Input traj. File name of the input trajectory. Clicking Open button, the file requester is shown. Output traj. File name of the output trajectory. Output format File format of the output trajectory. Compression Compression level. It has an effect only if XTC format is selected. Append if the file exists If it's checked and the output trajectory exists, the converted frames are appended. *Consider selected atoms only* If it's checked, the active (visible) atoms only are saved into the new trajectory. Swap endian If it's checked, the endian of the converted trajectory is swapped. This function has an effect only if the DCD format is selected. Click Go ! button to start the conversion and Cancel button to close the window.

Frame extractor.r	It extracts the frames from a trajectory file (Input Traj.), saving them in the specified directory (Output Dir.). You can change Quenching step, Output format and Compression method.

NAMD SMD force plot.c	This script shows the force/frame, force/distance and distance/frame of a steered molecular dynamics simulation by reading the NAMD output file.

PELE PDB fix.c	This script fixes the non-standard PDB files generated by PELE to be read by VEGA ZZ. For more information about PELE, click here.

Ramachandran.c	This script performs the Ramachandran analysis for each trajectory frame. Before running it, you must open a trajectory file. For each frame, the Phi and Psi backbone torsion angles are measured and evaluated if they are inside or outside the Ramachandran permission areas. For each frame is calculated the percentage referred to the total number of the residues and these values are visualized in a plot. This calculation is useful to highlight the secondary structure evolution during a MD simulation. If the percentage of the residues (Phi and Psi values) inside the permission areas is decreasing during the simulation, it means that the secondary structure evolves to a worse situation. Vice versa, if the percentage is growing, the secondary structure is improving.

SDF export.c	It converts the current trajectory in a SDF database. Each structure in the database is equivalent to each frame in the trajectory file.

Water remover.r	It eemoves all water molecules from a trajectory converting it into a PDB multimodel file. This script is obsolete and it's maintained as example only. The same function is now implemented in VEGA ZZ without external scripts.

13.3.22 Utilities

This directory includes the generic scripts. Some of these require REBOL/View.

Bin2h.c	This script for developers converts a binary file to a C header file including a bite vector or a Base64 encoded string. In this last case, to decode the data, you can use HD_Base64EncodeMem() HyperDrive function by including hdbase64.h file.

Calculator.r	Simple calculator (script by Ryan S. Cole).

Calendar.r	Calendar and scheduler (script by Sterling Newton).

Clock.r	Digital clock (script by Carl Sassenrath).

Console.r	It opens the REBOL console.

CPU load.c	It shows the CPU load in a small window.

Image viewer.r	Image viewer.