Commentary: Professor Ard Louis on symmetry in evolution

Biological physics
Rudolf Peierls Centre for Theoretical Physics

Professor Ard Louis is a theoretical physicist here at the Rudolf Peierls Centre for Theoretical Physics. His latest paper, written with an interdisciplinary team of collaborators, identifies a novel evolutionary mechanism to explain the preponderance of symmetry in the natural world; he explains the work behind the paper and the findings.

Engineers routinely design in properties such as symmetry or modularity, because that makes their products more robust to perturbations, and makes it easier to adapt them in the future. In the natural world, we also find a preponderance of symmetry and modularity. It might be tempting to think that this pattern is caused by evolution selecting for these properties, since they bring important advantages in this biological context. However, evolution can’t plan in advance, it can only act on local short-term fitness advantages. So how do these properties, which are only advantageous in the long-term, arise?

A key step in our analysis is to distinguish the two main steps in Darwinian evolution. First, random mutations generate new 'variation' or organismal traits, also called phenotypes. Next, natural selection means that phenotypes with higher fitness will eventually come to dominate in a population. Most evolutionary theory concentrates on this second step. But what if the first 'arrival of variation' step is much more likely to generate phenotypes high in symmetry or modularity; could that lead to the bias towards these traits that we observe in nature?

A bias towards symmetry in protein complexes

To investigate this question, we studied over 34,000 protein clusters in the protein databank (PDB). Proteins are the molecular workhorses of the cell, and their 3D structure is incredibly important in determining their function. About 50% of proteins in the PDB are found in complexes of two or more proteins. After the proteins are made individually in the cell, they self-assemble into these multimeric forms. By using graph-theoretical techniques, we were able to classify the symmetries of all these clusters, and found that they were indeed highly biased towards symmetric structures. We also showed that these symmetric structures represent a minuscule fraction of all the possible structures. Clearly there is an extraordinarily strong bias towards symmetry in protein clusters.

In the cell, proteins evolve patches on their surfaces by which they stick to one another in the self-assembly process. To understand the fundamental evolutionary mechanisms that lead to the bias towards simplicity, we studied a simple model of self-assembling squares called 'polyominoes' (a domino is made of two squares, a trimino is made of three squares etc). Just like the proteins, the squares have sides that stick to one another, leading to a self-assembly process which is similar to the self-assembly by which protein clusters form.

Next, we created a population of randomly designed squares, and used an evolutionary algorithm to mutate and select new squares, with the goal at each step of creating self-assembling polyominoes. As an example, we set the goal to be a self-assembling 16mer, but did not specify which 16mer polyomino to make. There are in fact 13,079,255 different 16mer topologies (someone counted them!), and interestingly only 5 have D4 symmetry, the symmetry of a square, which means that you can flip them or rotate them and they stay the same. Since we didn’t specify which 16mer to make, you might think that we would find a D4 symmetry tile with roughly a 5 in 13 million chance. What we found instead was that our evolutionary algorithm fixed to one of these five D4 shapes about one third of the time. That is an extraordinary bias towards these high symmetry structures.

The reason for this bias can’t be natural selection, since we only select on the size 16, not on which 16mer to make. A closer look at our simulations showed that, upon random mutations of the tiles, you are much more likely to obtain a self-assembling 16mer with high symmetry than one with low symmetry. In other words, the mechanism that delivers the high symmetry here is not natural selection, but rather the first step in the evolutionary process: random mutations of the genotypes (here describing the bonding patterns of the squares) produces a strong bias towards high symmetry in the arrival of variation.

An algorithmic twist on the infinite monkey theorem

So what is the fundamental cause of this strong bias in the arrival of variation? To gain some intuition, it is helpful to break the process that generates new phenotypic variation into two further steps. First, there are random mutations that change the genetic material (the genotypes). Next, the genotype determines the new phenotype by the way it encodes a set of biophysical processes called development.

Some intuition can be gleaned from the famous infinite monkey theorem of monkeys typing on keyboards. In the standard story, if the keyboard has M keys, then every output of length N has equal probability (1/M)^N. In fact, with enough time, the monkeys may type out anything, including the works of Shakespeare, although it would take unimaginably long to do so.

While this picture is pretty much OK for describing how the genotypes change, what we really care about is the phenotypes. And there, a twist on the infinite monkey theorem provides better intuition. What if the monkeys were instead typing into a computer programming language? They might accidentally type the 21 character sequence “print 01’’ 500 times” which would generate a string of length 1000 of the form 0101010101…. But instead of this happening with a probability 1/M^1000, as in the original infinite monkey theorem, it would now happen with a probability 1/M^21, which is exponentially more likely. If you didn’t initially know whether you were seeing the direct outputs of the monkeys’ keyboards, or the outputs of random computer programmes, you could easily figure this out by noticing whether or not outputs that can be described by short programmes occurred much more often.

Formalising the infinite monkey theorem with AIT

This algorithmic intuition can be formalised in a mathematical field called algorithmic information theory (AIT). Its key idea is that the complexity of an output is most fundamentally described as the length of the shortest program that will produce the output on a universal Turing machine (UTM), a fundamental model of a computer. This kind of complexity is called Kolmogorov complexity, or sometimes more loosely, descriptional complexity. The formal link to our algorithmic twist on the infinite monkey theorem comes from the famous coding theorem of AIT which states that, if you randomly feed in programs into a UTM, then it is exponentially more likely to produce outputs with low Kolmogorov or descriptional complexity. This deep result from AIT is absolutely amazing, and should be much more widely known and taught than it is currently!

The connection between AIT and evolution follows if we think of random mutations to genotypes as being akin to changing the programs at random, and if we think of the biological phenotype (the variation), as analogous to the outputs of a computational device. We derived a mathematical theorem similar to the full AIT coding theorem that works for these genotype-phenotype (GP) maps. As you may have wondered already, GP maps are not usually UTMs, but our theory works for a broader set of input-output maps. For GP maps, it predicts that phenotypes with high probability (for physicists, ones that have high sequence space entropy) will have low Kolmogorov complexity.

The link to symmetry follows directly. A symmetric shape can be described by a short description, for example, take a component, and repeat it N times. By contrast, an asymmetric shape needs a longer description, since you need to specify in more detail where each component sits. So high symmetry typically corresponds to low Kolmogorov complexity.

AIT therefore predicts not just a bias towards symmetry in the arrival of variation, but more specifically that this should follow an exponential bias towards low descriptional (Kolmogorov) complexity. We tested this specific functional form for both the protein clusters and for the polyominoes, and our detailed AIT prediction works incredibly well. This good agreement suggests that the origin of the high symmetry in the protein clusters is caused not by natural selection, but instead by the algorithmic nature of the random process which produces variation.

In other words, while mutations to genotypes are, to first order, entirely random, what we really care about in evolution is the phenotypes that this process produces. And the probability with which these phenotypes appear is not uniformly random at all. Instead, there is an incredibly strong bias towards phenotypes with low descriptional complexity. In the case of structural phenotypes, this algorithmic process is spontaneously biased towards high symmetry. And apparently, these symmetric phenotypes are good enough to provide the function needed for natural selection to fix them in a population.

Finding similar signatures

Can we find similar signatures in other biological systems? We first looked at the folded shapes of RNA molecules, one of the other key molecules of life (together with DNA and proteins). While RNA is best known for copying DNA messages, it can also have functional roles much like proteins do. And, just as for proteins, the folded structure of functional RNA is a key to understanding what tasks it performs in the cell. One way of measuring the shape of a folded RNA is through the secondary structure, which measures what binds to what. These secondary structures can be abstracted as binary strings, which allows us to measure their complexity using standard compression techniques, not dissimilar to those used to compress text on your computer. Again, as predicted by our algorithmic picture of evolution, the frequency with which RNA secondary structures are found in nature very closely follows the exponential prediction from AIT. In other words, for every bit extra that you can compress the description of the RNA secondary structure, you double the frequency with which it appears in nature.

We also looked at a non-structural phenotype, the gene-regulatory network that regulates the cell-cycle, the process by which a cell divides. Specifically, for the well-studied network for budding yeast, we found that random sampling of possible gene-regulatory networks generated an exponential bias towards networks that produce simple outputs. The cell-cycle network found in nature for budding yeast is one of the most likely ones to appear. This evidence again suggests that a strong bias in the arrival of variation helped shape how yeast is regulated.

The question of whether or not bias in the arrival of variation has an impact on evolutionary outcomes has been hotly contested for many decades. One strength of the three examples above is that they are simple enough to allow us to address this question head on, with clear results pointing towards the critical importance of such bias.

For physicists, there is an intriguing connection to statistical mechanics. Essentially the story above is about a prediction that low complexity phenotypes have large sequence space entropy. While there are complications because the non-equilibrium evolutionary dynamics here are not in steady-state,  a direct analogy could be made to statistical mechanics by equating fitness to the (negative) energy. As every physicist knows, it is then crucial to also take the entropy into account to define a 'free-fitness', which is analogous to 'free-energy'.