In a rapidly changing world, some may take comfort in the constants; however, it has become apparent that even our DNA may be on the move. Recent studies have shown that the human genome contains roughly 100,000 pieces of endogenous retroviruses (ERVs), making up about 8 percent of our genome. ERVs are pieces of DNA that are of viral origin. When a person is infected, the virus has the potential to alter human DNA by adding some of its own genetic material (RNA) to it, and that RNA can be passed down through generations. The implications of viral DNA — if they have the ability to replicate or reproduce — are unknown. Some ERVs can be linked to cancer and pregnancy; however, the functions of the majority of ERVs are still being explored.
The function of ERVs can have both beneficial and harmful effects on humans. For instance, in the 1900s, certain ERVs were discovered to have pathogenic effects as transmissible agents of cancer. On the other hand, they can benefit the health of a developing fetus, as an ERV helps build a cell layer around it, providing protection from toxins in the mother’s blood. Most of the ERVs discovered are no longer functional; however, the exploration of their potential has revolutionized immunology studies by creating vaccines, precautionary measures for cancer, and new ways for scientists to better understand cancer biology.
Recent studies have shown that the human genome contains roughly 100,000 pieces of endogenous retroviruses (ERVs), making up about 8 percent of our genome.
Most ERVs discovered were because of their similarity to modern viruses, but scientists have hypothesized that not all ERVs share those attributes. These viruses have not been identified by conventional techniques and have most likely gone extinct or are unknown. They can unlock a new wealth of knowledge as scientists explore the mechanisms viruses utilized in the past and how they affect viruses in the future.
Computer scientists are looking at how to recognize more of these ERVs using machine learning techniques. One group of researchers used a machine learning algorithm, focusing on one class of viruses to train their model. Using known non-retroviral RNA virus elements, the machine learning algorithm can differentiate between viral and non-viral DNA. When given the entire genome, the algorithm can identify all viral DNA. In a 2021 study, 100 potential pieces of viral DNA were identified. After filtering out false positives and viruses already discovered, researchers were left with a grand total of one ERV; this being said, the lone ERV has proven very valuable.
ERVs present scientists with the opportunity to explore how viruses have evolved over time.
When examined, the newly discovered ERV was similar to an insertion site in chimpanzees and marmosets, meaning the insertion occurred about 43 million years ago. Moreover, there can be a wide range of ERVs in the human genome unlike any modern viruses. Discovering this novel ERV lead to the opportunity for computer scientists to find different ERVs by using viruses other than one specific class of viruses to train the machine learning algorithm or exploring the genome of other animals. The genomes of animals such as bats and rats, known common vectors for disease, are starting to be explored.
The discovery of ERVs has revolutionized the fields of immunology and pathology. ERVs present scientists with the opportunity to explore how viruses have evolved over time. The use of machine learning to discover ERVs not detected by conventional homology techniques gives scientists further insight to the diversity of ancient and modern viruses.
Image Source: Max Pixel