Stumbling through the dark: How thousands of unknown proteins could rewrite biology
November 12, 2025
By
Saumya Sawant
Is the absence of something evidence for its existence? For a scientist studying the dark proteome, the answer just might be yes.
“A dark protein is one we know nothing about,” explains Michael Levitt, a 2013 Chemistry Nobel Prize Winner who, with Martin Karplus and Arieh Warshel, pioneered some of the first computer models capable of predicting protein structure, function, and even behavioral interactions with other substances such as drugs. “We do not recognize it at all as being similar to a protein that we do know something about.”
Dark proteins were first brought to light after the Human Genome Project published the initial draft of the human genomic sequence in 2001. The project marked a historic milestone in science. And yet, one discovery raised more questions than it did answers - why did only a fraction of the human genome (roughly 1-2%) actually code for proteins? Was the remaining 98% just junk?
Biologists puzzled over this question. Or as Kári Stefánsson , chief executive of deCODE genetics, put it, “Evolution has absolutely no tolerance for junk.” In other words, if that 98% of the human genome existed, nature must have had a very good reason for it.
“The dark proteome, which had once been defined by its absence, has now become the newest frontier of science.”
“The dark proteome, which had once been defined by its absence, has now become the newest frontier of science.”
A breakthrough came in 2009 when Jonathan Weissman and Nicholas Ingolia unexpectedly came across a lead in the right direction. They initially sought to catalog cellular proteins using a novel method they’d developed called ribosome profiling (which measures protein synthesis by tracking ribosome activity). Theoretically, they should have received outputs matching proteins that had already been cataloged and discovered. In actuality, what they ended up finding was “thousands upon thousands” of unknown proteins that “map to portions of the genome that weren’t thought to produce proteins.” These, of course, were the dark proteins.
Dark proteins, however, are not a monolith. While some are named so due to their unknown structure and function, others have dark “regions” where parts of their 3D structure remain unknown, even if the rest of the protein has already been mapped. It’s hypothesized that more than 50% of the proteins in eukaryotic organisms and viruses could belong to the dark proteome, posing an intriguing challenge to biologists and chemists alike. Could these proteins be responsible for deadly diseases? Or conversely, could these same proteins be used to defend against harmful pathogens and infections? While the scientific community is eager to study them, the endeavor is far more complicated than previously imagined.
For one, while typical proteins in the human body are hundreds of amino acids long, many members of the dark proteome belong to a group known as microproteins . As the name entails, these proteins are tiny - less than 100 amino acids long - and difficult to identify using traditional methods such as mass spectroscopy. To complicate matters further, microproteins can also be masked by the presence of larger proteins, which are often found in abundance within cells and which often oversaturate collected samples.
Yet, the use of technologies such as AlphaFold , an advanced AI system capable of predicting 3D protein structures from just their amino acid sequence, has greatly assisted dark proteome research. Recently, it predicted the structures of more than half of the dark proteins present in the human proteome. And while some limitations remain (for instance, “orphan” proteins with no close evolutionary relatives often yield low confidence predictions), each updated version of the software pushes the boundaries a little further, making it inevitable that one day, the full set could be mapped.
Ultimately, scientific understanding of the dark proteome remains limited in scope. However, the invention of novel technologies and a growing understanding of the role these proteins play in a variety of diseases continues to drive forward the work of those seeking to shed light on perhaps the darkest part of biology. In this way, the dark proteome, which had once been defined by its absence, has now become a visible and novel frontier of science.
