 BioinfoMAC: Gromacs Protein-Ligand Complex tutorial

When we are rocking in Gromacs there are a series of steps that we must follow:

Once you have downloaded the structure, you can visualize it using a viewing program such as VMD, Chimera, PyMOL.

we will prepare our system topology in two steps:

Prepare the protein topology with pdb2gmx
Prepare the ligand topology using external tools if applicable

The next command will create a topology file from selected PDB file in this case complex.pdb is our PDB file, is a Protein bind to DNA.

$ pdb2gmx -f complex.pdb

The structure will be processed by pdb2gmx, and you will be prompted to choose a force field:

Select the Force Field:
From '/usr/local/gromacs/share/gromacs/top':
 1: AMBER03 force field (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
 2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
 3: AMBER96 force field (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)
 4: AMBER99 force field (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)
 5: AMBER99SB force field (Hornak et al., Proteins 65, 712-725, 2006)
 6: AMBER99SB-ILDN force field (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)
 7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)
 8: CHARMM27 all-atom force field (with CMAP) - version 2.0
 9: GROMOS96 43a1 force field
10: GROMOS96 43a2 force field (improved alkane dihedrals)
11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)
12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)
13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)
14: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
15: [DEPRECATED] Encad all-atom force field, using full solvent charges
16: [DEPRECATED] Encad all-atom force field, using scaled-down vacuum charges
17: [DEPRECATED] Gromacs force field (see manual)
18: [DEPRECATED] Gromacs force field with hydrogens for NMR

So we need to use a Force field (FF) which can handle protein and DNA molecules For this tutorial, we will use the force field AMBER99SB-ILDN, so type 6 at the command prompt, followed by 'Enter'. The force field will contain the information that will be written to the topology. Chose wisely!

You will be prompted to choose a water model i chose TIP3.

There are many other options that can be passed to pdb2gmx. Some are listed here:

-ignh: Ignore H atoms in the PDB file; especially useful for NMR structures. Otherwise, if H atoms are present, they must be in the correct order and named exactly how GROMACS expects them to be.
-ter: Interactively assign charge states for N- and C-termini.
-inter: Interactively assign charge states for Glu, Asp, Lys, Arg, and His; assign disulfides to Cys.

You have now generated three new files: complex_processed.gro, topol.top, and posre.itp. complex_processed.gro is a GROMACS-formatted structure file that contains all the atoms defined within the force field (i.e., H atoms have been added to the amino acids in the protein). The topol.top file is the system topology. The posre.itp file contains information used to restrain the positions of heavy atoms.

The output shows electric charge and mass of the system by sections its really important to take note of the charge of each chain protein or DNA.

Processing chain 1 'A' (1262 atoms, 158 residues)
...

Total mass 17961.034 a.m.u.
Total charge 12.000 e

Processing chain 2 'B' (507 atoms, 25 residues)
...
Total mass 7605.906 a.m.u.
Total charge -24.000 e
Writing topology

Processing chain 3 'C' (491 atoms, 24 residues)
...
Total mass 7351.772 a.m.u.
Total charge -23.000 e

Total mass in system 32918.712 a.m.u.
Total charge in system -35.000 e

It is possible to simulate proteins and other molecules in different solvents, provided that good parameters are available for all species involved.

There are two steps to defining the box and filling it with solvent:

Define the box dimensions using editconf.
Fill the box with water using genbox.

As you become more comfortable with periodic boundary conditions and box types, I highly recommend the rhombic dodecahedron, as its volume is ~71% of the cubic box of the same periodic distance, thus saving on the number of water molecules that need to be added to solvate the protein. For an excellent summary of the many different water models, click here, but be aware that not all of these models are present within GROMACS.

Let's define the box using editconf:

$ editconf -f conf.gro -o complex_newbox.gro -c -bt dodecahedron -d 1.0

Now that we have defined a box, we can fill it with solvent (water). Solvation is accomplished using genbox:

$ genbox -cp complex_newbox.gro -cs spc216.gro -o complex_solv.gro -p topol.top

What genbox has done is keep track of how many water molecules it has added, which it then writes to your topology to reflect the changes that have been made. Note that if you use any other (non-water) solvent, genbox will not make these changes to your topology! Its compatibility with updating water molecules is hard-coded


We now have a solvated system that contains a charged protein. The output of pdb2gmx told us that the protein has a net charge of -35e (based on its amino acid composition). If you missed this information in the pdb2gmx output, look at the last line of your [ atoms ] directive in topol.top; it should read (in part) "qtot 6." Since life does not exist at a net charge, we must add ions to our system.




The tool for adding ions within GROMACS is called genion. What genion does is read through the topology and replace water molecules with the ions that the user specifies. The input is called a run input file, which has an extension of .tpr; this file is produced by the GROMACS tool grompp (GROMACS pre-processor), which will also be used later when we run our first simulation. What grompp does is process the coordinate file and topology (which describes the molecules) to generate an atomic-level input (.tpr). The .tpr file contains all the parameters for all of the atoms in the system.

Use grompp to assemble a .tpr file, using any .mdp file. I use an .mdp file for running energy minimization, since they require the fewest parameters and are thus the easiest to maintain. To produce a .tpr file with grompp, we will need an additional input file, with the extension .mdp (molecular dynamics parameter file); grompp will assemble the parameters specified in the .mdp file with the coordinates and topology information to generate a .tpr file.






An .mdp file is normally used to run energy minimization or an MD simulation, but in this case is simply used to generate an atomic description of the system. An example .mdp file (the one we will use) can be downloaded here.



In reality, the .mdp file used at this step can contain any legitimate combination of parameters. I typically use an energy-minimization script, because they are very basic and do not involve any complicated parameter combinations.

Assemble your .tpr file with the following:


$ grompp -f ions.mdp -c complex_solv.gro -p topol.top -o ions.tpr


Now we have an atomic-level description of our system in the binary file ions.tpr. We will pass this file to genion:


$ genion -s ions.tpr -o complex_solv_ions.gro -p topol.top -pname NA -nname CL -np 35








When prompted, choose group 14 "SOL" (for this case) for embedding ions. You do not want to replace parts of your protein with ions.


In the genion command, we provide the structure/state file (-s) as input, generate a .gro file as output (-o), process the topology (-p) to reflect the removal of water molecules and addition of ions, define positive and negative ion names (-pname and -nname, respectively), and tell genion to add only the ions necessary to neutralize the net charge on the protein by adding the correct number of negative ions (-nn 8). You could also use genion to add a specified concentration of ions in addition to simply neutralizing the system by specifying the -neutral and -conc options in conjunction. Refer to the genion man page for information on how to use these options.



The names of the ions specified with -pname and -nname were force field-specific in previous versions of GROMACS, but have been standardized as of version 4.5. The specified ion names are always the elemental symbol in all capital letters, which is the [ moleculetype ] name that is then written to the topology. Residue or atom names may or may not append the sign of the charge (+/-), depending on the force field. Do not use atom or residue names in the genion command, or you will encounter errors in subsequent steps.


The solvated, electroneutral system is now assembled. Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).



The process for EM is much like the addition of ions. We are once again going to use grompp to assemble the structure, topology, and simulation parameters into a binary input file (.tpr), but this time, instead of passing the .tpr to genion, we will run the energy minimization through the GROMACS MD engine, mdrun.



Assemble the binary input using grompp using this input parameter file:


grompp -f minim.mdp -c complex_solv_ions.gro -p topol.top -o em.tpr



Make sure you have been updating your topol.top file when running genbox and genion, or else you will get lots of nasty error messages ("number of coordinates in coordinate file does not match topology," etc).



We are now ready to invoke mdrun to carry out the EM:


mdrun -v -deffnm em



The -v flag is for the impatient: it makes mdrun verbose, such that it prints its progress to the screen at every step. The -deffnm flag will define the file names of the input and output. So, if you did not name your grompp output "em.tpr," you will have to explicitly specify its name with the mdrun -s flag. In our case, we will get the following files:

em.log: ASCII-text log file of the EM process
em.edr: Binary energy file
em.trr: Binary full-precision trajectory
em.gro: Energy-minimized structure


writing lowest energy coordinates.









Steepest Descents converged to Fmax < 1000 in 1671 steps


Potential Energy  = -1.4910198e+06


Maximum force     =  8.8282404e+02 on atom 123


Norm of force     =  1.2215010e+01






There are two very important factors to evaluate to determine if EM was successful. The first is the potential energy (printed at the end of the EM process, even without -v). E_pot should be negative, and (for a simple protein in water) on the order of 10⁵-10⁶, depending on the system size and number of water molecules. The second important feature is the maximum force, F_max, the target for which was set in minim.mdp - "emtol = 1000.0" - indicating a target F_max of no greater than 1000 kJ mol^-1 nm^-1. It is possible to arrive at a reasonable E_pot with F_max > emtol. If this happens, your system may not be stable enough for simulation. Evaluate why it may be happening, and perhaps change your minimization parameters (integrator, emstep, etc).

Let's do a bit of analysis. The em.edr file contains all of the energy terms that GROMACS collects during EM. You can analyze any .edr file using the GROMACS tools g_energy:


g_energy -f em.edr -o potential.xvg




At the prompt, type "10 0" to select Potential (10); zero (0) terminates input. You will be shown the average of E_pot, and a file called "potential.xvg" will be written. To plot this data, you will need the Xmgrace plotting tool. The resulting plot should look something like this, demonstrating the nice, steady convergence of E_pot:




Statistics over 1670 steps [ 1.0000 through 1670.0000 ps ], 1 data sets


All statistics are over 1322 points (frames)









Energy                      Average   Err.Est.       RMSD  Tot-Drift


-------------------------------------------------------------------------------


Potential                -1.44656e+06      24000    71431.2    -155083  (kJ/mol)



Energy Minimization plot

Now that our system is at an energy minimum, we can begin real dynamics.

EM ensured that we have a reasonable starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperature we wish to simulate and establish the proper orientation about the solute (the protein). After we arrive at the correct temperature (based on kinetic energies), we will apply pressure to the system until it reaches the proper density.

Remember that posre.itp file that pdb2gmx generated a long time ago? We're going to use it now. The purpose of posre.itp is to apply a position restraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they allow us to equilibrate our solvent around our protein, without the added variable of structural changes in the protein.

Equilibration is often conducted in two phases. The first phase is conducted under an NVT ensemble (constant Number of particles, Volume, and Temperature). This ensemble is also referred to as "isothermal-isochoric" or "canonical." The timeframe for such a procedure is dependent upon the contents of the system, but in NVT, the temperature of the system should reach a plateau at the desired value. If the temperature has not yet stabilized, additional time will be required. Typically, 50-100 ps should suffice, and we will conduct a 100-ps NVT equilibration for this exercise. Depending on your machine, this may take a while (just over an hour on a dual-core MacBook). Get the .mdp file here.

We will call grompp and mdrun just as we did at the EM step:

grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr

mdrun -deffnm nvt

A full explanation of the parameters used can be found in the GROMACS manual, in addition to the comments provided. Take note of a few parameters in the .mdp file:

gen_vel = yes: Initiates velocity generation. Using different random seeds (gen_seed) gives different initial velocities, and thus multiple (different) simulations can be conducted from the same starting structure.
tcoupl = V-rescale: The velocity rescaling thermostat is an improvement upon the Berendsen weak coupling method, which did not reproduce a correct kinetic ensemble.
pcoupl = no: Pressure coupling is not applied.

Let's analyze the temperature progression, again using g_energy:
g_energy -f nvt.edr

 BioinfoMAC

Páginas

jueves, 12 de febrero de 2015

Gromacs Protein-Ligand Complex tutorial

1 comentario:

Datos personales

Archivo del blog

Visitors