When you think of a cell, you might imagine a balloon-like structure with different organelles inside, each performing a specific function. Some might be breaking down molecules and producing energy while others could be storing information. One of the most important jobs in the cell is to produce proteins — in fact, it takes a whole factory’s worth of blueprints, assembly, quality control, packaging, and shipping to produce the vast number of proteins that are needed for everyday life.
Like other macromolecules, proteins are constructed from a set of common building blocks called amino acids. Each of the 20 amino acids are small, nearly identical molecules that are made unique by their side chain (also called the R-group). Each side chain lends different chemical properties to the amino acid, changing its role in the overall structure of a protein.
Depending on the sequence of a protein’s amino acids and how they’re folded together, many different structures can be made. This is vital, as a protein’s structure directly begets its function.
The amazing part is that it isn’t entirely the random, chaotic process it seems to be.
Since structure is so important, you might expect there to be robust pathways in place to shape each protein piece by piece so that they are all perfect, with every step of the process being tightly controlled. However, the reality is that proteins are most often assembled by sticking the sequence of amino acids together end to end and then ejecting them into the cytoplasm. Some proteins require further processing, post-translational modification, or export out of the cell. The flexible protein then wriggles and moves about before spontaneously folding itself into the correct shape. Some proteins require a chaperone to help guide them into their final form, but most proteins form through their random movements.
The amazing part is that it isn’t entirely the random, chaotic process it seems to be. If a 100-amino-acid protein were to attempt to fold into the correct conformation purely randomly, that is, by moving around until every amino acid is in the exact spatial arrangement of the final protein, it would take 1.6 x 1027 years. Since smaller proteins can fold within a second, this obviously isn’t the whole story.
In the end, the governing forces of protein folding are the laws of thermodynamics. One important equation within chemistry and biochemistry is the Gibbs free energy equation:
ΔG = ΔH – TΔS
Where G is Gibbs free energy, H is enthalpy, and T and S are temperature (held constant in the reproduced equation) and entropy, respectively. When a reaction causes the change in Gibbs free energy to be negative (that is, when the system is producing energy), it’s said to be energetically favorable, and it will occur spontaneously. In other words, the reaction that produces the most energy will occur without any outside forces necessary.
The connection between how these fundamental laws apply to protein folding is illustrated by Richard Dawkins in “The Blind Watchmaker,” where he describes how a monkey typing on a keyboard could eventually write a sentence from “Hamlet.” It would, of course, take a very long time. However, if the monkey types randomly while you use every correct keystroke, it would only take on the order of approximately 3,000 keystrokes to write a 30-character line.
As a protein flails around with reckless abandon, any amino acids that fall into their most energetically favorable position stay in that position.
Within the context of protein folding, biochemists refer to this process as cumulative selection. As a protein flails around with reckless abandon, any amino acids that fall into their most energetically favorable position stay in that position.
Despite the fact that we know how protein folding happens and the basic rules that command the process, we still can’t always predict what protein will result from a random sequence of amino acids. This has become one of the most important problems to solve within biochemistry, as it could unlock a whole new genre of protein science.
This mission is at the core of the Critical Assessment of Techniques for Protein Structure Prediction (CASP), a biennial competition in which teams attempt to predict the structure of a protein from just its sequence using artificial intelligence. One competitor, Alpha Fold 2, showed promising results in the 2020 competition (CASP15), being able to predict the correct location (within a certain threshold) of each amino acid with about 90 percent accuracy. While this isn’t perfect, this is a huge step forward in the field — who knows how accurate the predictions will be in 2022?
Sources
Nature (2020). DOI: 10.1038/s41586-019-1923-7