Protein Sequence Design Tools: From the Three-Piece Set to AI Intelligent Agent Dry-Wet Closed Loop
Published on May 18, 2026

After the 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper, protein design truly went mainstream. The names AlphaFold, RFdiffusion, ProteinMPNN — these tools began to appear frequently at academic conferences, in funding news, and even in government reports.
But those who have used them know an awkward reality: our calculations are becoming more accurate, yet we get stuck at "validation."
Designing a sequence can be done by ProteinMPNN in just a few seconds. But can this sequence actually be expressed, can it fold properly, does it have activity? These questions need to be sent to another group of people, another set of equipment, another budget cycle. After waiting two to three weeks for the results, if it doesn't work, it's hard to tell whether the issue lies in the computation or the experiment.
The context of this gap is that the global protein engineering market is projected to exceed $4.25 billion by 2026, with a compound annual growth rate of over 16%. Capital is pouring in, papers are booming, and tools are iterating — the real turning point in the industry is no longer "can AI design proteins" but rather "who can efficiently link design with validation."
This is exactly the core of what this article wants to discuss: in the evolution of protein sequence design tools, what new paradigms is the industry exploring?

Protein Sequence design tool
1. The Technological Evolution of Protein Sequence Design Tools: From 'Guessing Sequences' to 'Creating Proteins'
Let's briefly review the technological trajectory in this field.
Traditional protein sequence design has long relied on directed evolution and high-throughput screening, requiring the synthesis and testing of tens of thousands of sequences, with cycles that could last months or even years, and a success rate of only 0.1%-1%. The advent of deep learning changed the landscape: ProteinMPNN uses message-passing neural networks to achieve sequence generation in seconds, with a sequence recovery accuracy exceeding 52%; RFdiffusion adds the ability to 'design backbones from scratch,' making de novo protein design a systematic engineering process.
Consequently, a classic three-step workflow gradually took shape: RFdiffusion (generate backbone) → ProteinMPNN (fill in sequences) → AlphaFold (validate structure). This process is widely used in the research community and has spawned modular design pipelines like ProteinDJ.
However, it has a fundamental flaw: it focuses solely on computation, not validation. Can a computed sequence be expressed, folded, and functional in the lab? This question must be addressed by another group of people, another set of equipment, and another budget. The feedback loop is extremely long—one iteration from sequence design to wet lab validation often takes weeks or even months.
Meanwhile, many protein sequence design tools are released as independent modules, lacking task-level intelligent scheduling capabilities. A complete research and development project requires manually switching between 5-8 tools, from structure prediction to sequence generation to druggability evaluation, each with its own environment, format, and parameter system. This creates a paradox: we have unprecedented computational power, yet we are still hindered by fragmented tools and validation gaps—the true efficiency bottleneck in protein design lies not in computation, but in the transmission efficiency from 'design results' to 'experimental feedback.'
2. What the industry is trying: From 'tool' to 'closed loop'
After noticing this contradiction, the industry has shown a clear paradigm shift: from developing 'single design tools' to building 'integrated design-verification platforms.' Since 2025, the academic community has begun exploring frameworks for AI-assisted protein design. In April 2026, 'Big Zero Bay' released a product — Matwings Technology's conversational protein R&D agent MatwingsVenus™ (Xiaowu™). From an industry observation perspective, the interesting aspect of this product is that its approach is not to create 'another protein sequence design tool,' but to integrate design, verification, and expert intelligence in a conversational agent, completing the full process loop from design to verification.
The specific approach is: users express R&D requirements in natural language, and the agent automatically breaks down tasks, using more than 200 integrated protein design tools and a protein database with billions of real labeled samples to complete computational design; then, through the platform's built-in communication mechanism, the design results are directly sent to an automated laboratory — robots handle sample preparation, protein purification, and functional testing — experimental data are automatically fed back to the AI model, starting the next round of optimization. This is the 'design as verification, verification as iteration' wet-dry closed loop.
This is worth paying attention to, not only because it compresses the project management systems, email communications, and scheduling delays between the past two groups (computational and experimental teams) into a continuous automated process from an engineering perspective, but also because it represents a new R&D philosophy: protein sequence design tools should not remain only at the computational level, but should have the capability to 'run the full process.'
3. Understanding the Value of a Closed Loop Through a Case Study
The collaboration project between Matwings Technology and Jinsai Pharmaceuticals is a sample worth analyzing. The goal was to modify an alkaline protein used under extreme industrial conditions. Leveraging AI-directed evolution capabilities, the team completed the modification in just four months—enhancing the protein's alkali resistance by four times in extreme environments with a pH of 13-14, doubling its lifespan, and successfully achieving 5,000-liter scale industrial production. This became the world's first case of industrialized implementation of a protein designed by a large model, saving the company over tens of millions of yuan annually in costs. The key here is not that "AI designed a sequence," but that the entire chain from design to verification to scale-up production was completed continuously on one platform. This closed-loop efficiency is incomparable to the traditional "design—send email—queue for experiment" model.
4. A Trend in Progress: the "Democratization" of Protein Design
In 2024, the Nobel Prize showed us the scientific value of protein design; in 2025-2026, the market is validating its industrial value. As the form of protein sequence design tools shifts from "dispersed modules" to "intelligent closed loops," a deeper change is occurring: protein research and development is moving from "large platform-driven" to "personally accessible." Traditionally, a complete protein design-verification pipeline required top-tier computing resources, a professional wet lab team, and years of accumulated experience, essentially a capability exclusive to large pharmaceutical companies and top research institutes. Now, through the model of "AI design + shared automated laboratory," this capability is being packaged as infrastructure callable via natural language. According to industry analysis, global AI-driven protein design market revenue is about 3.586 billion yuan in 2025 and is expected to approach 10.25 billion yuan by 2032, with a compound annual growth rate of approximately 15.8%. Driving this growth is precisely the democratization of design capability—more and more small to medium innovative players are beginning to participate in protein R&D, and what they need is a tool that can deliver both "design" and "verification" together.
The next competitive dimension for protein sequence design tools is no longer "whose algorithm is more accurate," but "who can run the entire chain from design to verification faster, more reliably, and more accessibly." Matwings Technology is one of the explorers in this direction. We are also continuously monitoring industry changes and the practices of peers. If you are interested in this direction, feel free to share your thoughts in the comments—what protein sequence design tools have you used? What is your view on the implementation of dry-wet closed loops?