How do scientists conduct AI protein design?
Published on May 8, 2026

When you hear the term 'AI designing proteins,' the image that may come to mind is something like this: a scientist typing a few lines of code on a screen, and in the blink of an eye, a completely new protein, never seen in nature, comes into existence.
This scene is not exaggerated. In 2024, the Nobel Prize in Chemistry was awarded to scientists David Baker, Demis Hassabis, and John Jumper precisely for their revolutionary contributions to AI protein structure prediction and design. This means that AI protein design has officially moved from being a cutting-edge laboratory exploration to becoming mainstream science.
Traditional protein design is like 'finding a needle in the ocean,' taking years with a very low success rate. With AI's involvement, this once 'hit-or-miss' research has been completely transformed into a precise, engineering-controlled operation.
So the question arises: how exactly do scientists use AI to design proteins? What kind of 'techniques' are at play behind this process? Today, let's break it down.

Proteins are essentially a sequence of amino acids, but this sequence folds into a specific shape in three-dimensional space—like an extremely complex piece of origami. Traditionally, to determine the structure of a protein, scientists had to use X-ray crystallography or cryo-electron microscopy, which could take months or even years.
AI has changed this situation. Represented by DeepMind's AlphaFold and David Baker's team's RoseTTAFold, these AI models can accurately predict the three-dimensional structure of a protein from its amino acid sequence in just a few minutes to a few hours. To give an analogy, this is like previously having to manually measure every dimension of a building, whereas now you can simply give AI a floor plan, and it can instantly render a complete 3D model.
This achievement itself has been recognized with a Nobel Prize, but it can only be considered the "entry ticket" to AI protein design—we still need to go further.

Having a structure is not enough. What scientists really want to do is reverse engineering: given a target function (such as "an enzyme that can efficiently degrade plastic" or "an antibody that can precisely block cancer cells"), they "reverse-design" a protein sequence and structure that can achieve this function. This is what is called "de novo design."
The core logic behind this is to let AI "understand" the grammar of proteins.
A protein's amino acid sequence is like a language with only 20 letters (corresponding to the 20 natural amino acids). These letters combine, fold, and interact in specific ways to form "functional sentences" that have been selected through billions of years of evolution in nature. In recent years, scientists have developed a large number of protein language models, which are similar to the large language models used to train ChatGPT, except that the "corpus" consists of hundreds of millions of protein sequences.
Through large-scale pretraining, these models learn the evolutionary rules, structural constraints, and functional patterns in protein sequences. Once they master this "language," AI can generate completely new protein sequences that comply with physicochemical principles, and these sequences may never have appeared in nature.
One of the most milestone methods is the introduction of diffusion models. Diffusion models initially made a splash in the field of image generation (such as Stable Diffusion), and scientists creatively transferred them to the field of protein design. The general principle is: first, noise is added to distort the protein structure, and then the model is trained to learn how to reverse this noise and restore it. In this "destruction-reconstruction" loop, the model learns how to "grow" a completely new protein backbone from random noise that meets specific geometric constraints.
On this basis, tools like ProteinMPNN complete the final step — "translating" the generated protein backbone back into a specific amino acid sequence.
This forms the classic two-step method: first generate the protein backbone, then "fill in" the sequence. In some scenarios, this method subtly echoes the "inverse problem" in computational physics — not deriving protein folding from first principles, but letting data-driven models learn the mapping from structure to sequence in reverse.

If the above methods solve the problem of 'from structure to sequence,' then the ultimate pursuit of scientists is to bridge the final link of 'from sequence to function'—after all, the final deliverable is not a string of code, but a truly 'functional' protein. In technical terms, this is called a 'dry-wet closed loop': AI completes the design in the digital world (dry experiments), the robotic laboratory performs the verification (wet experiments), and the verification results are then fed back to the AI for the next round of optimization.
In this regard, domestic companies have also provided their own solutions.
In April 2026, Shanghai Tianwu Technology released the conversational protein research and development intelligence MatwingsVenus™ (XiaoWu™), integrating the above ideas into a platform where users can 'chat' to use it.
The platform’s logic is very interesting: users input task goals in natural language—for example, 'help me design a protein that can block a certain immune regulatory receptor'—and the system automatically breaks down the task, orchestrates over 200 underlying protein design tools, and completes the full computational workflow from scaffold selection, interface design, sequence optimization, to druggability prediction. Subsequently, the design results are seamlessly linked to automated laboratories, where robots handle sample preparation, protein purification, and functional testing, and the testing results are fed back into the next round of AI design, forming an iterative loop of 'computation-driven wet experiments, and wet experiments feeding back into computation.'
In real drug development scenarios, this system has been validated in multiple projects. For example, in a de novo design project targeting a certain immune regulatory receptor, MatwingsVenus™ successfully obtained dozens of entirely new binder molecules with in vitro cell-blocking activity, completing the full-loop workflow of 'AI design—automated experiments—functional verification.'
It is worth noting that the creator of this platform, Professor Hong Liang, Chief Scientist of Matwings Technology, proposed an even more ambitious vision at the 2025 Pujiang Innovation Forum. He divided the development of AI protein design into three stages: the 'past tense' refers to widely recognized mature tools like AlphaFold and RFdiffusion; the 'present tense' refers to the current AI Agents and general protein large models in use; and the 'future tense,' which he calls 'AI Co-scientist,' will be when AI can proactively propose scientific hypotheses and design validation paths, becoming a main collaborator with human scientists in innovation.

Just think that, the essence of AI protein design is a paradigm shift from "luck-based" to "programmable."
In the past, protein engineering was more like a craft—relying on scientists' experience, intuition, and a lot of trial and error. Now, with the boom of tools such as geometric deep learning, diffusion models, and protein language models, protein design is gradually transforming from a complex "science" into a predictable, high-efficiency "engineering."
Platforms like MatwingsVenus™ that connect AI prediction with automated experimentation further lower the R&D capabilities that were previously accessible only to large institutions to a level that individuals and small teams can reach—this might be the most exciting part of this transformation.
The "programmable era" of protein design is coming.