Evo 2 AI Model Reads the Language of DNA to Fight Disease

Evo 2
Source: Nature | Overview of model architecture, training procedure, datasets and evaluations for Evo 2.

Key Points:

  • Scientists launched Evo 2, the largest biological AI model to date.
  • The model was learned from the DNA of over 100,000 different species.
  • It can accurately identify disease-causing mutations, like those in breast cancer genes.
  • The software is fully open-source, allowing global researchers free access.

A powerful new artificial intelligence model called Evo 2 is giving scientists an unprecedented look at the building blocks of life. Published recently in the journal Nature, this machine learning tool can read and understand DNA sequences across more than 100,000 species. Researchers from the Arc Institute and NVIDIA teamed up with universities in California to build this massive project.

Evo 2 acts like a master translator for the language of genetics. The team trained the model on over 9.3 trillion nucleotides, which are the basic units of DNA and RNA. By analyzing this vast amount of data, the AI learned to spot hidden patterns created by millions of years of evolution. It understands relationships between distant parts of a genome that would take human researchers years to find through traditional experiments.

The practical applications are already impressive. During tests, Evo 2 looked at variants of the BRCA1 gene, which is linked to breast cancer. The model predicted with over 90 percent accuracy whether a specific mutation was harmless or potentially dangerous. This kind of speed and precision could save millions of dollars in research costs and help doctors develop new medicines much faster.

Beyond just reading DNA, the software can actually write it. Researchers used Evo 2 to design brand new synthetic genomes, including functional bacteriophages that could one day treat antibiotic-resistant bacteria. The team also hopes the tool will help engineers build highly targeted gene therapies that only activate in specific cells, reducing side effects for patients.

To power this massive undertaking, the developers used a new AI architecture called StripedHyena 2 and trained the system for months on a supercomputer running over 2,000 NVIDIA graphics cards.

Importantly, the creators made safety a priority. They intentionally kept data regarding dangerous human pathogens out of the training set to prevent the AI from creating harmful biological materials. The entire model, including its code and training data, is now freely available to the public, offering scientists worldwide a new tool to tackle pressing health challenges.

Source: Nature (2026).

EDITORIAL TEAM
EDITORIAL TEAM
Al Mahmud Al Mamun leads the TechGolly editorial team. He served as Editor-in-Chief of a world-leading professional research Magazine. Rasel Hossain is supporting as Managing Editor. Our team is intercorporate with technologists, researchers, and technology writers. We have substantial expertise in Information Technology (IT), Artificial Intelligence (AI), and Embedded Technology.
Read More