## [q-bio.BM] An Introduction to Biomolecular Simulations and Docking -- Mura and McAnany

http://arxiv.org/abs/1407.3752
http://arxiv.org/ftp/arxiv/papers/1407/1407.3752.pdf

Any arguments with opinions in this post are welcome. I write this as (hopefully) a guide for those interested in the topic but do not want to go head first into the paper. This paper is a wonderful explanation of molecular dynamics as a tool in computational biophysics, and I highly recommend reading it if you are interested in the field. This is the best paper I've read so far that covers the computational biophysicist's toolset as a whole.

--------------------------------------------------
My familiarity with subject: 9/10
Style: 8/10
Ease (for layman): 8/10
Length: 24pgs (excluding references and images)
--------------------------------------------------

Sec. 2.2. The paper outlines the necessity, accuracy, and implementation of molecular dynamics simulations as a tool for studying molecular biology. Since "some experimental methods are inherently limited for certain types of questions for any biomolecular system [...], computational approaches such as MD [molecular dynamics] simulation offer an appealing route to exploring [...] full atomic detail, particularly when the desired information is experimentally inaccessible" (page 2). MD simulations probe the region between classical, experimentally verified systems and physical chemistry. This tool for this region is known to the physicist as statistical mechanics.

Sec. 3.1. A brief overview of statistical mechanics for the non-physicist: it is "the theoretical framework linking the microscopic (atomic-level) properties of a molecule to its thermodynamic properties" (page 3). In other words, we know quantum mechanics extremely precisely, and we know classical thermodynamics. Statistical mechanics is the link between the two fields. A system of N particles with M states per particle results in $M^N$ configurations. Since the particles also have kinetic energy, the system Hamiltonian (energy) enjoys a "virtual infinitude of potential configurations". Basically large numbers of particles with interactions, positions, momenta, etc result in a "combinatorial explosion" (page 3). Statistical mechanics takes a probabilistic approach since "any observable/bulk quantity become[s] so strongly spiked that the mean statistical values can be taken as a single, well-defined thermodynamic quantit[y]" (page 4).

Sec. 3.2. Now for the interactions on this scale. As for inter-molecular forces, we have electrostatic and van der Waals. The Coulomb force decays as the inverse square of the distance, but (for convenience) the van der Waals force decay is modeled as a Lennard-Jones interaction which decays with the inverse distance to the sixth power. Other forces such as London Dispersion, the hydrophobic effect, and hydrogen bonding can be considered electrostatic in nature. "In summary, electrostatics and vdW forces are what dictate the structure and energetics of biopolymer folding, assembly, and dynamics" (page 6).

Sec. 3.4. A degree of freedom (DoF) is a "well-defined parameter that quantifies some property [...] where the parameter is free to vary across a range of values independently of other DoFs" (page 7). For a system with n DoFs, the energy surface is an n-dimensional surface in n+1-dimensional space. A molecule of N atoms in 3-dimensions has 3N DoFs. When simulating a system, we may directly relate the relative populations of our system within configuration space to thermodynamic energy differences (from the Boltzmann distribution). If you picture the population landscape as hills within configuration space, the depth of basins corresponds to enthalpy, and the width to entropy. Although this beautiful portrait of energetics presents itself in theory, adequate sampling of configuration space in reality is difficult.

Sec. 3.5. Langevin dynamics is the general approach to this formulation of classical dynamics. The Langevin equation contains a frictional term and a noise term such that in the low friction regime we receive Newton's dynamics, and in the diffusive limit of large friction, we receive Brownian dynamics.

It is important to sample all of configuration space. Sometimes deep troughs in the energy landscape go unsampled even though their contribution would proportionately contribute more to the equilibrium ensemble average (by the Boltzmann weight). If we do not sample adequately, we violate the axiom of statistical mechanics "bulk/ensemble properties are calculated from a distribution" (page 9). If we violate this axiom, the tools of statistical mechanics fail to reflect the true properties of the system.

Sec. 4.2. How accurate is MD at the scale we're interested? We can base our estimates on the de Broglie wavelength and the Born-Oppenheimer approximation. The thermal de Broglie wavelength is $$\Lambda = \frac{h}{\sqrt{2\pi m k_B T}},$$ where $h$ is Planck's constant, $k_B$ Boltzmann's, particle mass $m$, and temperature $T$. If $\Lambda$ is much less than the average inter-particle separation in our system, we're totally cool. Luckily this is true for protein systems at typical temperatures.

Electron density moves two orders of magnitude more quickly than the nuclei, so we are able to assume the quantum mechanical qavefunction is separable and can be factorized into nuclear and electronic components. At this point, we absorb the electronic component into the effective interatomic potentials and call it a day.

Sec. 4.3. SPEAKING OF WHICH, let's now talk about that wonderful interatomic potential. The potential energy function for the system is called the force-field (FF). This FF contains terms for bonds, angles, dihedrals, impropers, Lennard-Jones interactions, and Coulomb attraction. The constants used in these equations are generated from highly detailed quantum mechanical calculations and experiment.

Sec. 4.4. With such a large potential and N atoms, our calculation is of complexity O($N^2$). With fancy computer science we reduce this to O($NlogN$). To move the system forward in time, the FF is calculated at the present time. Then with this information, the negative gradient of the potential gives the force which when applied to each atom yields accelerations. These accelerations are applied and each atom is moved forward in time ~2 femtoseconds. This method is iterated until the desired time is achieved.

Sec. 4.5. With methods such as vdW force switching, particle-mesh Ewald, and pair-list distance, we can highly reduce the complexity of the simulations. If you want a more detailed understanding of these methods, read page 13 (it is very well explained). Periodic boundary conditions are applied on all sides on the system so that the atoms see an infinite crystal rather than vacuum. When the system is crafted, a short minimization is undertaken so as to reduce ridiculously high potentials at the start since these potentials could cause the system to crash or explode. You must nurture the system. Love on it. Cradle it. Sometimes I find myself humming gentle lullabies to sooth my systems before their inevitable, violent spasms.

Sec. 4.7. The authors detail various methods for analysis such as root mean square deviation and fluctuation (RMSD/RMSF), principal component analysis (PCA), and radial distribution functions. For a detailed account, just read pages 14-16.

Sec. 4.7.3. Sometimes particle-mesh Ewald will crash your system if you haven't neutralized it with counter-ions. Good to know.

When simulating, there is a choice between computational cost and sampling. For the best results, try to maintain as close to an equilibrium as possible, and make sure your system appears to be hitting all of configuration space.

Sec. 4.8. Simulated annealing is a process for better sampling configuration space. The system is heated to ungodly temperatures (so as to provide the necessary thermal energy for surmounting potential mountains), then cooled by a prescribed cooling schedule to reasonable temperatures.

Sec. 5. The authors then delved into computational docking. Basically molecules touch sometimes, and it's helpful to know what's going on while that happens. I did not read much further since the paper became much more focused and lost my attention quickly.
Share: