Saturday, July 26, 2014

Review and application of group theory to molecular systems biology [part 1]

Edward A Rietman, Robert L Karp, and Jack A Tuszynski; Theor Biol Med Model. 2011; 8: 21.


My opinion: I love it when physicists stick mathematics where it doesn't belong. This paper is a bit old I know (Elwin wanted recent papers), but I really wanted to read it, and I figured I'd summarize it too while I'm at it for this blog. I'm going to break this up into a few pieces since I want to explain concepts as we go for the reader new to abstract algebra.

Any arguments with opinions or analyses in this summary are warmly welcomed.

Ratings for part 1 only:

General: 9/10
My familiarity with subject: 7/10
Style: 9/10
Ease (for layman scientist with no algebra background): 7/10?
Length: 7pgs [total paper: 27pgs] (excluding references and images)


According to general biology textbooks, life is characterized by

  • metabolism
  • self-maintenance
  • duplication involving genetic material
  • evolution by natural selection

But many complex details of life are overlooked or unaccounted for. The universe may be considered a Riemannian resonator and "life can be thought of as some sort of machinery the universe uses to diminish energy gradients" (page 1). The foundations of physics is based in group theory: \(SU(3)\) quarks and \(SU(2)\times U(1)\) for electro-weak interactions.

Group Theory

For the reader not well versed in group theory, I suggest following Nick's post on particle physics. But I will go over the general concepts anyway.

A group is a set of objects that behave in a certain way under some operation (addition, multiplication, etc.). More formally, a group \(G\)  is a nonempty set with a binary operation, \(\cdot\), which satisfies:
  1. Closure -- any two elements of \(G\) combined return another element in \(G\).
  2. Associativity -- although the positions of elements usually matters, the order in which you combine juxtaposed elements does not.
  3. Identity -- there exists an element that causes no effect when combined with any other element
  4. Inverses -- For every element, there exists another that may be combined with it to return the identity.
Or stated even more formally,
  1. \(\forall g,g' \in G\ \ g\cdot g' \in G\).
  2. \(\forall a,b,c \in G\ \ a\cdot(b \cdot c) = (a\cdot b)\cdot c\).
  3. \(\exists e \in G\ \forall g \in G\ g\cdot e = e\cdot g = g\).
  4. \(\forall g \in G \ \exists g^{-1} \in G\ gg^{-1} = g^{-1}g = e\).
However the authors of this paper suspiciously leave out condition 1. Which in my mind is absolutely terrifying when considering a group.

"Isomorphic" is a concept that basically means that two groups behave the same way, and if the elements of one group were relabeled, they would in fact be identical. This is important because "Any group [...] is isomorphic to a subgroup of matrix groups" (page 4). And a subgroup, just so you know, is exactly what you'd want it to be -- a group inside of another group.

Some important groups with associated general properties:
  • Orthogonal groups \(O(n)\) -- group of rotations in \(n\)-dimensional Euclidean space including reflections.
  • Special orthogonal groups \(SO(n)\) -- groups of rotations in \(n\)-dimensional Euclidean space excluding reflections.
  • Unitary groups \(U(n)\) -- The inverse of an element is its complex conjugate transpose.
  • Special unitary groups \(SU(n)\) -- Inverses are complex conjugate transposes, and the determinant is also \(\pm 1\).

Genetic Code

Ribosomes take in tRNA (nucleic acids) and output proteins (amino acids). This translation of information is done by "codons", which are sections of three nucleic acids. Each codon codes for a specific amino acid. Mathematically, we may regard a codon as the direct product of the set of nucleic acids \(S = \{U, C, A, G\}\) with itself thrice which yields \(4^3 = 64\) possible codons. Now hold on, because this is going to get fun.

Since there are not 64 amino acids, many codons code for the same amino acids. We can compile two lists:
  • \(M_1 = \{ AC, CC, CU, ...\}\)
  • \(M_2 = \{ CA, AA, AU, ... \}\)
The first set \(M_1\) corresponds to doublets whose third nucleic acid doesn't matter. Any nucleic acid following the two in an element from \(M_1\) will not change the resulting amino acid. The second set \(M_2\) corresponds to doublets whose third nucleic acid absolutely matters. Without the third nucleic acid, a sequence from \(M_2\) will not code anything.

With these sets we may define an operation. Let this operation be switching one letter for another. We have a few possibilities:
  • \(\alpha: A \leftrightarrow C\) and \(U \leftrightarrow G\).
  • \(\beta: A \leftrightarrow U\) and \(C \leftrightarrow G\).
  • \(\gamma: A \leftrightarrow G\) and \(U \leftrightarrow C\).
or written in permutation notation,
  • \(\alpha =\begin{pmatrix}A & U & G & C \\ C & G & U & A\end{pmatrix}\)
  • \(\beta =\begin{pmatrix}A & U & G & C \\ U & A & C & G\end{pmatrix}\)
  • \(\gamma =\begin{pmatrix}A & U & G & C \\ G & C & A & U\end{pmatrix}\)
With this operation defined, we have a group among our sets. In fact, this group we have defined is isomorphic to something known as the Klein four group, the group that preserves the symmetries of a rectangle in two dimensions. Two scientists extended this representation to a 4-dimensional hypercube, which looks absolutely crazy:

A subgroup \(N\) of \(G\) is called a normal subgroup if any element in \(N\) may be multiplied on the left by some element in \(G\), multiplied on the right by that element's inverse, and still be an element of \(N\), or stated formally, \(N \unlhd G \iff \forall n\in N,\ \forall g \in G\ gng^{-1}\in N\). It's just a fancy subgroup that has absolutely terrible philosophical implications upon closer inspection. The four normal subgroups for the \((K4 \times K4)\) representation shown above are written on page 6.

With all this work done, we may now see some meaning. We may develop a 64-dimensional hypercube of general genetic code (as stated above) by \(D = \{A, U, G, C\} \otimes \{A, U, G, C\} \otimes \{A, U, G, C\}\). The symmetry operations on this space are the codons. Multiple vertices code for the same amino acid, so our mapping is surjective.

If we continue even further and include time evolution, then we have a 65-dimensional differentiable information space manifold \(M[X]\). It is actually postulated that evolution is a geodesic in this information spacetime. Holy shit, right? You should rather be thinking bullshit, but let's continue.

We may define a metric between species (polynucleotide trajectories) statistically by
$$ d = \left[ \sum\limits_{\mu} \left(x'^{\mu} - x^{\mu}\right)^2\right]^{\frac{1}{2}} $$
the regular ol' Euclidean metric. From here we may "see regions of the information-spacetime that have not been explored by evolution" (page 7).

We may actually analyze our system in terms of symmetry breaking within a higher dimensional Lie algebra. From the symplectic group \(sp(6)\) we may break its symmetry to result in our system.
  1. \(sp(6) \supset \left[sp(4) \otimes su(2)\right]\)
  2. \(\left[sp(4) \otimes su(2)\right] \supset \left[su(2)\otimes su(2) \otimes su(2)\right]\)
  3. \(\left[su(2)\otimes su(2) \otimes su(2) \right]\supset\left[ su(2)\otimes u(1) \otimes su(2)\right]\)
  4. \(\left[su(2)\otimes u(1) \otimes su(2)\right] \supset \left[su(2) \otimes u(1)\right]\)
  5. \(\left[su(2) \otimes u(1)\right] \supset u(1)\)
We end on page 7 of 29 (the end of this discussion on codons).
© Arxiv Blog All rights reserved | Theme stolen from Seo Blogger Templates