Back to list

AI-Directed Evolution: Unlocking the Future Code of Protein Design

Published on May 20, 2026

AI-Directed Evolution: Unlocking the Future Code of Protein Design

I. The "Slowness" and "Difficulty" of Protein Engineering

Proteins are the functional elements of life and the core raw materials for industries such as pharmaceuticals, food, and materials. But modifying a protein is never an easy task.


The core method of traditional protein engineering is directed evolution—an approach inspired by natural selection, using random mutations combined with high-throughput screening to "pick" out better-performing candidates from a vast number of variants. This method won the Nobel Prize in Chemistry in 2018, but its costs are also very evident:

- Long cycles: It usually takes months or even years for an antibody to move from discovery to preclinical development;

- Extremely low efficiency: The success rate of traditional screening is between 0.1%–1%, with a large amount of experimental resources consumed by "ineffective variants";

- High professional threshold: It requires the knowledge and experience of senior experts and is highly dependent on large experimental platforms.


A deeper dilemma lies in the fact that the search space of natural evolution is nearly infinite. A protein with 300 amino acids can have up to 20³⁰⁰ possible sequence variants, while experiments can only cover a tiny fraction. Traditional directed evolution is essentially a random trial in the dark, and the ceiling of efficiency is already foreseeable.


But what if we could "see"?

Protein Engineering

Protein Engineering

II. AI-Directed Evolution: From 'Natural Selection' to 'Intelligent Selection'

The core idea of 'AI-directed evolution' is not complicated: replace random mutations with AI models, directly predicting which mutations at which sites are most likely to improve the target function, thus compressing the search space from 'full-scale traversal' to 'precision-guided.'

However, the gap between 'easy to say' and 'reliably implemented' is three mountains high: data, models, and validation loops.


2.1 Data Foundation: Tens of Billions of Sequences Covering Extreme Environments

The upper limit of an AI model's capability depends on the quality and coverage of the training data.

Matwings Technology has built a protein sequence dataset called VenusPod, integrating major databases such as NCBI, UniProt, and MGnify, accumulating over 15 billion annotated sequence data entries, including 6.5 billion real experimental functional labels. The number of these functional labels is more than 100 times the total labels in global public databases.

A significant differentiating feature of this dataset is that it not only covers conventional bioinformatics but also includes protein sequence information collected from extreme environments such as the deep sea and volcanoes, featuring high-temperature, high-pressure, and strong acid/base resistance. These extreme environment sequences naturally encode the 'codes' for protein functionality under harsh conditions — when a model needs to design an alkali-resistant industrial enzyme, these sequences are the most precious 'textbooks.'

Because of this, the VenusPod dataset won second place in the 2025 National Finals of the 'Data Elements ×' competition.


2.2 Model Engine: Bridging 'Sequence to Function'

Currently, most work in the protein AI field focuses on structure prediction or de novo design, but the most urgent industry demand is functional optimization — enhancing the activity, stability, and affinity of an existing protein.

Matwings Technology independently developed the MatwingsVenus™ (Xiaowu™) general large model for protein design, taking a third approach: protein-directed evolution. The model adopts a Transformer architecture and is pretrained on ultra-large-scale protein data, mastering the complex semantic relationships between protein sequences, structures, and their functions, achieving direct prediction from 'sequence to function.'

The model has three core capabilities: AI-directed evolution — precisely enhancing multidimensional performance of a known protein; AI enzyme mining — discovering natural proteins with extreme resistance properties from hundreds of millions of sequences; AI de novo design — creating entirely new binding proteins from scratch.

2.3 Dry-Wet Closed Loop: Design is Verification, Verification is Iteration

No matter how accurate AI predictions are, without experimental validation, they are mere "armchair strategies." A long-standing pain point in protein design is the serious disconnect between dry experiments (computational design) and wet experiments (physical validation).


In April 2026, Matwings Technology released the conversational protein R&D intelligent agent MatwingsVenus™ (XiaoWu™), whose core breakthrough lies in bridging this "last mile"—establishing a "conversational dry-wet closed loop":

Dry experiment end: Users input task objectives in natural language, and the agent automatically decomposes the tasks and schedules design, prediction, and analysis capabilities;

Seamless connection: Design results are automatically imported into plasmid ordering and experimental planning workflows through a self-built communication mechanism;

Wet experiment validation: Drives robots to complete sample preparation, protein purification, and functional testing;

Data feedback: Experimental results are automatically fed back into the next round of AI design, forming an iterative loop of "computation-driven wet experiments, wet experiments informing computation."


This means that the complete R&D capability, which used to be accessible only to large enterprises and major research institutes, is now transforming into infrastructure that individuals can also utilize.


III. Case Validation: Real-World Implementation of AI-Directed Evolution

The quality of technology is ultimately judged by the effectiveness of project implementation. In the field of AI-directed evolution, Matwings Technology has accumulated multiple successful cases covering innovative drugs, bioprocessing, in vitro diagnostics, and more, such as alkaline phosphatase activity optimization, GLP-1 tool enzyme directed evolution, extremely alkali-resistant single-domain antibodies, domestic production of key raw materials for pancreatitis diagnostics, and de novo design of immunoregulatory receptor targets, all of which have either achieved industrial application or entered scaling-up production.


3.1 Innovative Drugs: De Novo Design of Immunoregulatory Receptor Targets

Immunoregulatory receptors are high-value targets in innovative biologics R&D, widely involved in cancer, autoimmune, and inflammatory diseases. However, de novo design of this type of target combined with proteins is extremely challenging: the targets are novel with few reference molecules, predominantly polar surface regions, and natural ligands already exhibit nanomolar-level high affinity.


Based on the MatwingsVenus™ (XiaoWu™) platform, Matwings Technology inputs the target structure and functional requirements, allowing the Agent to automatically perform the entire computational workflow including scaffold screening, interface design, sequence optimization, and druggability prediction, quickly outputting high-quality binder design sequences. Verified through an automated experimental platform, dozens of designed molecules demonstrated clear blocking activity in in vitro cell activity experiments, with both functional inhibition and high-affinity potential. This serves as a landmark case of full-process validation of AI de novo designed binding molecules.

3.2 In Vitro Diagnostics: AI-Enhanced Enzyme Supports Localization of Core Raw Materials for Pancreatitis Diagnosis

α-O-oligosaccharide EPS-G7 is the key substrate for blood amylase testing in acute pancreatitis and has long been monopolized by international giants, with domestic production completely reliant on imports.


Matwings Technology, in collaboration with partners, used protein language models to perform multi-target directed evolution on cyclodextrin glycosyltransferase—simultaneously enhancing transglycosylation activity, reducing hydrolytic activity, and increasing regioselectivity. After only one round of model training prediction, the optimal triple-point mutant was obtained, increasing the proportion of catalytic synthesis products from 63% to 98%, and improving the transglycosylation/hydrolysis ratio by 12 times. At a 100L pilot scale, the yield reached 161 g/L with a purity above 99%.


This achievement was published in Bioresource Technology, opening a new path for the domestic substitution of core raw materials for pancreatitis diagnosis.

Cyclodextrinase Enables Precision a-O-Oligosaccharide Synthesis

Cyclodextrinase Enables Precision a-O-Oligosaccharide Synthesis

IV. From Tools to Agents: The "Personally Usable" Vision of MatwingsVenus™ (XiaoWu™)


Looking back at the technological evolution path of Matwings Technology, a clear leap from model → tool → agent can be observed:


Phase 1.0 — Large Model Driven: Using general large models to achieve the predictive capability of "sequence-to-function," reducing the R&D cycle from 2–5 years to 2–6 months.


Phase 2.0 — Platform Integration: Integrating over 200 protein design tools, 50 certified experts, and skill packages fine-tuned by 30 domain experts to form a one-stop R&D platform.


Phase 3.0 — Intelligent Agent Closed Loop: Launching the MatwingsVenus™ (XiaoWu™) conversational agent to connect dry and wet loops, realizing an automated R&D model of "design as verification, verification as iteration."


The essential change in MatwingsVenus™ (XiaoWu™) is that it is not a toolbox that stacks functions together, but an "AI scientist" that automatically organizes workflows around task objectives. Users only need to describe their requirements in natural language — for example, "I want to improve the activity of Cas12i3" or "Help me find some new cellulases" — and the system will automatically break down tasks, schedule capabilities, execute verification, and provide feedback for iteration.


5. The Future is Here: The Infinite Possibilities of AI-Directed Evolution

Protein Structure

Protein Structure

From drug development to biomaterials, from agricultural enzyme preparations to personalized medicine, AI-driven directed evolution is breaking the boundaries of protein design.

Proteins serve as the underlying 'chips' in many fields, including innovative drugs, in vitro diagnostics, industrial enzymes, and food technology. In China alone, the domestic market replacement potential for industrial enzymes reaches tens of billions annually. The value of AI-directed evolution lies in achieving standardized, high-throughput protein molecule delivery through 'AI design + automated experiments,' lowering the threshold for protein design from '10 years of professional training' to 'a single conversation.'

The MatwingsVenus™ (XiaoWu™) platform is not just a tool; it is a bridge connecting 'AI creativity' with 'biological possibilities.' Here, every protein will have its own 'evolution script,' and humans only need to define the objectives—the rest is left to intelligence and science.