Back to list

How exactly do you 'mine' protein sequences? We tried this company's conversational AI.

Published on May 21, 2026

How exactly do you 'mine' protein sequences? We tried this company's conversational AI.

1. Redefining 'Sequence Mining' — When R&D Moves from Needle-in-a-Haystack to Precision Targeting

A startup working on plastic-degrading enzymes was stuck in the lab for a whole ten months. The team found a naturally occurring enzyme sequence in the literature that was said to have potential, and transferred it into their own expression system, but its activity never met industrial requirements. They tried directed evolution, screening thousands of mutants, yet the best round only improved the activity by less than twofold. As the project was about to fail, an intern in the team suggested: why not use AI to scan all known metagenomic sequences first? The team leader smiled wryly and kept saying no — scanning is easy, but what comes after? Out of thousands of candidate sequences, which one is worth wet-lab experiments? Who should decide?

Metagenomic sequence generation process

Metagenomic sequence generation process

This is not a joke; it is a real dilemma that happens every day in the field of protein research and development.

The problem has never been a lack of sequences. Protein sequences in public databases already number in the hundreds of millions and are still growing exponentially. The real challenge is: how to efficiently and accurately mine those few 'golden sequences' among the vast sea of sequences that can truly solve practical problems? In the past, this relied on expert experience, long trial-and-error processes, and luck. Now, Tianwu Technology has launched a tool called the MatwingsVenus™ (Xiaowu™) intelligent agent, attempting to turn this task into something more like a daily conversation ability.


2. What is protein sequence mining?

In simple terms, it means quickly locating or designing de novo candidate sequences that meet certain functional requirements (such as heat resistance, high catalytic activity, specific binding, etc.) within a massive amount of amino acid sequences.

This concept emerged in the post-genome era—gene sequencing technology caused an explosion in protein sequence data, but the vast majority of sequences have unknown functions. Traditional mining mainly relies on homology searches: if sequence A looks like sequence B with a known function, it is inferred that A has a similar function. But this method can easily miss 'distant proteins' that are evolutionarily far apart, and it requires extensive manual tuning and wet lab validation, with cycles measured in 'years' and at high cost.


Three unavoidable bottlenecks:

Efficiency: relying on expert experience and trial-and-error, long cycles, high cost;

Data: sequence data is massive, but sequences with functional annotations are very few;

Tools: existing computational tools are fragmented, requiring multiple platforms for analysis, prediction, and validation.


AI-driven sequence mining is changing this situation. By learning from massive unannotated sequences, protein large language models can automatically extract deep features, efficiently process large-scale data, and even discover 'dark matter' proteins that traditional methods struggle to reach.


Typical application scenarios:

Industrial catalysis: finding enzymes that can remain stable in strong acids, strong bases, or high temperatures;

Biopharmaceuticals: designing antibodies or binding molecules that precisely target specific sites;

Synthetic biology: mining natural protein components with new functions.


3. Technological breakthrough: What makes the Xiaowu intelligent agent different?

MatwingsVenus

MatwingsVenus

Tianwu Technology has released the conversational protein R&D agent MatwingsVenus™ (Xiaowu ™). Chief scientist Hong Liang pointed out that AI is transforming protein engineering from a complex "science" that relies on experience and luck into a predictable, highly efficient "engineering."

The underlying support for this shift is a protein dataset containing nearly 15 billion sequences, of which nearly 6.5 billion are functionally labeled—marking protein performance under specific temperatures, pH levels, and pressures. In contrast, traditional methods may rely on only a few million annotated sequences, with sources scattered.

By learning the mapping relationships between these sequences and functions, the model grasps the "functional characteristics" of different proteins, enabling it to identify and design sequences that meet target requirements.

"Conversational" Interaction: Make sequence mining as simple as chatting

The core design philosophy of the MatwingsVenus™ ™ agent is: you only need to converse with a real person, propose R&D requirements in natural language (such as "Design a protease sequence that works stably at pH 13"), and the system will automatically break down the task, completing the entire process from literature review and patent search to protein sequence mining and design.

The platform integrates 200+ professional protein design tools, 50+ certified experts, and 30+ skills tuned by experts from various fields, all accessed on demand through AI agents.

"Dry and Wet Closed Loop": Sequence mining is not the end, but implementation

The most critical design of the MatwingsVenus™ (Xiaowu ™) agent lies in its "conversational wet and dry closed loop." After the AI agent completes sequence design, the platform automatically connects the results to the automated shared laboratory through a self-developed communication mechanism, driving the robot to complete sample preparation, protein purification, and functional testing. Experimental data is then fed back to the AI model for the next round of iterative optimization.

This "design is verification, verification is iteration" model means sequence mining is no longer just a "paper simulation" in the digital world, but truly connects the entire chain from computation to physical experiments.


4. Practical Validation: Real-life cases of the Xiaowu agent in sequence mining

Case 1: Immune regulatory receptors—Mining binding molecules from scratch

Project Background and Challenges: This is a highly challenging innovative target—lacking molecular references for similar drugs, with target surfaces mainly in polar regions, lacking typical high-potency binding hotspots, and natural ligands already possessing nano-level ultra-high affinity. Under such conditions, designing entirely new binding molecules (binders) from scratch is extremely challenging.

How to mine Xiaowu: Relying on the MatwingsVenus™ ™ platform, based on target structure and functional requirements, agents automatically complete full-stage calculations including skeleton screening, interface design, sequence optimization, and druggability prediction, quickly generating high-quality binding molecular sequences.

Result validation: Samples prepared by the automated experimental platform performed excellently in vitro cell activity detection, with dozens of molecules showing clear cell blocking activity, combining functional inhibition with high affinity potential—completed Design from scratch and validate the entire process of combining molecules.

AI De Novo Design

AI De Novo Design

Case 2: Industrial Enzyme Project — Industrialization Launched by "Mining" from Literature

Tianwu Technology employs two major strategies, "AI-directed evolution" and "AI enzyme mining," and has accumulated numerous successful cases in extreme tolerance (high temperature, strong acid, strong alkali). For example, the mining and optimization of a plastic-degrading enzyme was completed in just a few months, breaking through the traditional inefficient "needle in a haystack" approach.


So far, Tianwu has successfully delivered more than 30 protein projects, covering innovative pharmaceuticals, in vitro diagnostics, nutrition and health, food and beverages, beauty and skincare, bioenergy, and other fields, achieving industrialization for nearly 10 products.


5. From "Exclusive to Large Institutions" to "Available for Individuals": When Protein Sequence Mining Capabilities Are "Shared" .

cases

In the past, protein sequence mining was a highly centralized capability—requiring interdisciplinary teams, expensive experimental equipment, and long-term funding. The MatwingsVenus™ (Xiaowu™) agent transforms this capability into infrastructure that individual users can also access, driving protein research and development from being 'platform-driven' to 'personally usable.'


As project R&D leader Tan Yang said, 'An important change brought by AI is that some capabilities that were previously highly scarce are beginning to be accessed in a more widely available way.'


Practical value for small and medium R&D teams and entrepreneurs:

Lowering the entry barriers: Work can be carried out without a complete protein engineering team.

Shortening the R&D cycle: The traditional cycle of several months to years is compressed into weeks or even days.

Reducing trial-and-error costs: The AI agent conducts large-scale virtual screening first, and only the most promising candidate sequences proceed to wet-lab validation.


6. Outlook and Conclusion

Tianwu Technology is continuously iterating its 'AI automated experiment' platform. Chief Scientist Hong Liang proposed the vision of 'AI co-research scientists': in the future, AI will not only assist experts in designing, but will also proactively propose scientific hypotheses and design validation paths, becoming a main participant in collaborative innovation with human scientists.


The improvement in protein sequence mining efficiency essentially provides a faster 'R&D engine' for the entire bioeconomy—from new drug discovery to green manufacturing, from novel materials to functional foods, every area that relies on protein technology will benefit.


Do you think the greatest value of AI in the field of protein design is currently 'speeding up' or 'discovering solutions that humans cannot think of'?"