Back to list

How AI Reshapes Biological Research — From 'Finding a Needle in a Haystack' to 'Precision Engineering'

Published on June 1, 2026

How AI Reshapes Biological Research — From 'Finding a Needle in a Haystack' to 'Precision Engineering'

If the past 100 years of biological research were like "finding a needle in the ocean," then the intervention of AI is like equipping every biologist with a sonar scanner.

In 2024, the Nobel Prize in Chemistry was awarded to AI—this technology, once regarded by biologists as an "outsider tool," officially ascended to the highest hall of life sciences. The team led by David Baker achieved de novo protein design, while Demis Hassabis and John Jumper's AlphaFold solved the protein structure prediction problem that had troubled the scientific community for half a century. This not only affirms the three scientists but also crowns the entire AI-driven paradigm in biological research.

So, what fundamental changes has AI brought to biological research?

AI-Biology-Revolution

AI-Biology-Revolution


1. The Historic Turning Point of the Scientific Research Paradigm

Hong Liang, a distinguished professor at Shanghai Jiao Tong University and Chief Scientist at Tianwu Technology, once made a profound statement: Artificial intelligence is transforming protein engineering from a complex 'science' that relies on experience and luck into a predictable and efficient 'engineering.'


This sentence reveals the essence of AI empowering biological research. The traditional research model can be summarized as 'hypothesis-driven, experiment-verified' — scientists propose hypotheses, design experiments, await results, analyze data, revise hypotheses, and then repeat this process. A complete research cycle can take several years. Especially in the field of protein engineering, screening a high-performance protein variant often requires constructing thousands to tens of thousands of mutant libraries, consuming a large amount of time and resources.


The intervention of AI reconstructs this model into a closed-loop process of 'data-driven, model-predicted, automated validation.' Researchers no longer need to blindly explore in the dark, but can obtain precise, 'think-tank' style guidance through AI models to rapidly identify optimal solutions in the vast molecular space.


Hong Liang's statement is also supported by top academic journals. A 2025 review published in BioDesign Research proposed a unified AI-first framework, extending enzyme engineering from single-enzyme modeling to multi-enzyme pathway design, integrating multiple dimensions such as sequence, structure, and reaction environment.


2. The Three Core Capabilities of AI Empowering Biological Research

Three Core Capabilities

Three Core Capabilities

 

Ability 1: AI Enzyme Mining—Precisely Fishing from the "Dark Matter" of Metagenomes

The maturation of metagenomics technology has enabled biologists to sequence the DNA of all microorganisms in the environment directly, without the need to culture them first. However, the massive amount of data has also brought new challenges—we have discovered hundreds of millions of unknown protein sequences, yet the vast majority of their functions remain "unknown." This "metagenomic dark matter" is like a huge gold mine, but we lack an efficient "shovel" to extract it.


In 2026, an important study published in PNAS demonstrated the groundbreaking application of AI in this field. The team developed the Horizyn-1 machine learning model, which can directly recommend suitable enzyme sequences based on a given chemical reaction and has completed comprehensive experimental validation in several tasks, including orphan reaction de-orphanization, prediction of enzyme promiscuous activities, and non-natural biochemical transformations.


Meanwhile, at the beginning of 2026, ACS Catalysis released a major review titled "Machine Learning-Driven Enzyme Mining: Opportunities, Challenges, and Future Perspectives," systematically outlining how machine learning has enabled a paradigm shift in the entire enzyme mining process—upgrading the traditional sequence homology search methods to data-driven precise function prediction. The review pointed out that current machine learning models are now capable of high-throughput prediction of multiple key functional parameters of enzymes, including EC numbers, gene ontology terms, substrate specificity, solubility, and thermal stability.


Ability 2: AI Mutation Prediction — Finding the 'Optimal Solution' in an Exponentially Expanding Combinatorial Space

If enzyme mining solves the problem of 'which enzyme to find,' then mutation prediction solves the problem of 'how to make the enzyme better.'

A protein with a length of 300 amino acids theoretically has about 5,700 possible single-point mutation combinations, approximately 16,000,000 double-point mutation combinations, and an astonishing 3.1×10¹⁰ triple-point mutation combinations. Traditional directed evolution methods can only test these one by one, making research efficiency extremely low.

AI-driven mutation effect prediction models are completely changing this situation. In a study published in Nature Communications in 2025, researchers used a deep learning-guided directed evolution algorithm to effectively search a combinatorial space of 10³⁵ levels with only limited screening of about 4,000 mutant strains — a number so large it is hard to imagine, equivalent to quickly finding targets in a combinatorial space on the scale of the total number of atoms in the known universe. Ultimately, the team successfully increased the activity of green fluorescent protein by 73-fold, reaching nearly twice the level of the current gold standard activity.

Another study published in Nature Communications built a machine learning-guided cell-free expression platform, constructing a predictive model by performing about 11,000 reaction tests on just 1,217 enzyme variants. This successfully increased the activity of amidase enzymes by 1.6 to 42-fold for the production of nine small molecule drugs.


Capability 3: Protein Language Models — AI That "Understands" the Code of Life

This is one of the most "sci-fi" areas in AI-powered biological research. Simply put, protein language models treat protein amino acid sequences as a "language" to train AI — much like learning English. By analyzing hundreds of millions of protein sequences, protein language models learn the "grammar" and "semantics" within them, enabling them to predict protein functions and even design entirely new proteins.

A study published in ScienceDirect in 2025 used the protein language model Pro-PRIME to simultaneously optimize three enzymatic properties of cyclodextrinase — enhancing transglycosylation activity, reducing hydrolytic activity, and improving regioselectivity — demonstrating the powerful ability of language models to guide enzyme engineering in balancing competing catalytic activities.

In the domestic academic field, the Hongliang research team at Shanghai Jiao Tong University and their collaborators released a new member of the Venus series of large models, VenusMine, in July 2025. This model integrates protein language large models with three-dimensional structure analysis and, through implicit mapping rules between protein sequences, structures, and functions, can efficiently identify enzyme molecules with low homology but excellent functionality in massive protein databases.


3. From Academic Concepts to Industrial Practice: The Exploration of Tianwu Technology

Biological research

Biological research

Transforming these academic frontiers into industrial practice, Tianwu Technology's independently developed MatwingsVenus™ ™ protein research agent is turning AI-driven biological research from concept into reality.


Dialogic Research—"What You Want Is What You Get"

MatwingsVenus™ is ™ a conversational protein R&D agent. Users only need to express their R&D needs through natural language as if conversing with a person, and it can automatically complete the entire process, from literature review and patent search to protein sequence design, achieving true "what you think is what you get."

The core capabilities of this system cover the entire AI research chain: AI directed evolution, AI enzyme mining, de novo design, structural prediction, mutation effect prediction, and more. More importantly, it integrates a protein sequence database worth tens of billions—covering not only conventional biological information but also protein sequence information collected from extreme environments such as the deep sea, volcanoes, and other extreme environments that withstand high temperatures, strong pressure, and strong acids/bases. These proteins, which have evolved for millions of years in extreme environments, inherently hold enormous industrial application potential.


From academic theory to industrial implementation

In a de novo design project targeting an immune regulatory receptor, Tianwu Technology, based on its independently developed MatwingsVenus™ platform ™, successfully obtained dozens of brand-new binder molecules with in vitro cell blocking activity, completing the full-process validation of the original binder design and demonstrating AI-driven strength in innovative protein drug development.

In the engineering application of glycosylation transferases in food-related industries, Tianwu Technology increased the enzyme's total glycosylation activity sevenfold in just four months, improved product specificity from 60% to 98%, and ultimately reduced core material costs by 90%. This case fully demonstrates the efficiency revolution brought by AI-powered research paradigms—traditional methods that might have taken 2~3 years for R&D have been compressed to 4 months under AI drives.


Extreme Technical Reserves

Tianwu Technology has built a dedicated protein dataset containing nearly 9 billion sequences, relying on the “Mingyuan Project” to integrate special functional sequences from extreme environments such as volcanoes and deep-sea trenches. The platform also integrates more than 200 protein design tools, a database with tens of billions of labels, and expert-optimized skills, enabling full-process computational work from target discovery and molecular design to performance prediction automatically.

MatwingsVenus™ (Xiaowu™) was even featured in a special report by Dragon TV and Shanghai Media Group in May 2026, becoming a model of Shanghai's translational basic research, fully demonstrating that the AI-driven paradigm of biological research in China has already established a solid path from academic theory to industrial practice.


4. Outlook: The Next Frontier of AI Biological Research

In March 2026, NVIDIA Digital Biology Labs announced a series of new developments at the GTC conference, systematically demonstrating AI’s capabilities in protein structure and function design. A forward-looking overview published in Nature Communications in May 2026 pointed out that structural biology is entering a new phase, in which generative methods aim to approximate Boltzmann-weighted ensembles and can design high-affinity protein binders from scratch.


By 2025, the scale of China’s biomanufacturing industry had reached 1,010 billion yuan, officially entering the trillion-yuan level; the global biomanufacturing industry reached 1,219 billion USD, showing steady overall growth. One of the core engines driving this trillion-level industry growth is precisely the new AI-enabled paradigm in biological research.


From “finding a needle in a haystack” to “AI-precise navigation,” from “ten years to sharpen a sword” to “results in months” — AI is redefining the efficiency and boundaries of biological research. And for every researcher reading this article, this transformation has only just begun. The future of biological research may no longer be about “how hard you work,” but rather “how efficiently you collaborate with AI.”