Answers to Important Questions:
Question: Why can’t current AI systems like AlphaFold predict all protein structures?
A: Because about 30% of human proteins, so-called intrinsically disordered proteins, have no stable shape, are constantly changing and resist fixed 3D models.
Question: How is this new AI method for predicting protein structures different from other methods?
A: Instead of predicting static shapes, it uses physics-based molecular simulations and automated differentiation to “teach” the AI how sequence changes affect protein behavior.
Q: Why are disordered proteins so important?
A: These proteins control essential biological processes and are linked to diseases such as Parkinson’s, cancer and Alzheimer’s. This makes them important for future treatments.
Summary: A new machine learning method has achieved what even AlphaFold couldn’t: intrinsically disordered proteins (IDPs), shape-shifting biomolecules that make up about 30 percent of all human proteins. These unstable proteins play key roles in cell communication, sensing, and disease. However, their constantly changing structures pose a challenge to traditional AI prediction models.
Using automated differentiation and physics-based simulations, scientists have developed an algorithm that can fine-tune amino acid sequences for specific functions. The breakthrough could revolutionize synthetic biology, drug discovery, and our understanding of diseases like Parkinson’s and cancer.
Key data:
- New AI method: Uses automated differentiation to design inefficient proteins based on actual molecular physics, not predictions.
- Understanding the unknown: Intrinsically disordered proteins (IDPs), which never settle into a defined structure, are essential for cell signaling and are linked to neurodegenerative diseases.
- Major impact: The discovery paves the way for the design of synthetic proteins for drugs, sensors and molecular engineering.
Source: Harvard
In synthetic and structural biology, advances in artificial intelligence have led to an explosion in the design of new proteins with specific functions, from antibodies to blood clotting agents. This is achieved by using computers to accurately predict the 3D structure of any amino acid sequence.
But predicting the structure of about 30 percent of the proteins in the human genome is difficult even for the most powerful AI tools, including the Nobel Prize-winning AlphaFold.
These proteins, which are called intrinsically disordered, never adopt a fixed shape but are constantly changing their shape. They play key roles in a variety of biological functions, such as binding molecules, sensing, or signaling. However, their inherent instability makes them difficult to design from scratch.
A team from Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS) and Northwestern University has demonstrated a new machine learning method for designing intrinsically random proteins with tailored properties.
This work opens the door to a new understanding of these mysterious biological molecules and could lead to new knowledge about the origins and treatment of diseases.

The work, published in Nature Computational Science, was led by Ryan Kruger, a doctoral student at SEAS, and Krishna Srinivas, a former NSF-Simons QuantBio fellow and now an assistant professor at Northwestern University, in collaboration with Michael Brenner, professor of Applied Mathematics and Applied SEASli.
Shrinivas noted that he is interested in studying intrinsically defective proteins because they are beyond the reach of current AI-based methods, such as Google DeepMind’s AlphaFold, to predict and design proteins with different shapes.
However, these misfolded proteins are important for many fundamental aspects of biology. Moreover, mutations in these proteins have been linked to diseases such as cancer and neurodegeneration.
An example of a disrupted protein is alpha-synuclein, a protein that has long been linked to Parkinson’s disease and other diseases.
To design intrinsically defective proteins (IDPs) for synthetic or therapeutic purposes, Srinivas said, “We either have to build better AI models, or find a way to use these physical models so that we not only get good predictions but also provide the physics for free.”
Automatic differentiation algorithm
The paper describes a computational method based on algorithms that can perform “automatic differentiation”, or automatic calculation of derivatives (instantaneous rates of change), to rationally select protein sequences with desired behaviors or properties.
The technique is widely used for deep learning and training neural networks, but Brenner and his lab were among the first to recognize other potential applications, such as improving physics-based molecular dynamics simulations.
Thanks to automated differentiation, the researchers were able to make a computer recognize how small changes in the protein sequence, even single amino acid changes, affect the desired final properties of the protein.
They compared their method to a powerful search engine for amino acid sequences that meet the criteria needed to perform a function—for example, one that forms loops or connectors, or one that can detect different elements in the environment. “We didn’t want to collect a lot of data and train a machine learning model to design proteins,” says Kruger.
“We wanted to use existing, fairly accurate simulations to design proteins at the level of those simulations.”
This method uses a traditional framework for training neural networks, called gradient-based optimization, to efficiently and accurately identify novel protein sequences.
The result is that the proteins designed by the researchers are “differentiable.” This means they are not based on AI predictions, but on molecular dynamics simulations that use real physics. This takes into account how proteins actually behave dynamically in nature.
Funding: This research received federal support from the National Science Foundation’s Dynamical Systems Institute, the Office of Naval Research, the Harvard Center for Research in Materials Science and Engineering, and Harvard’s NSF-Simons Center for Statistical Analysis of Mathematics and Biology.
About this AI and genetics research news
Author: Anne Manning
Source: Harvard
Contact: Anne Manning – Harvard
Image: The image is credited to StackZone Neuro
Original Research: Closed access.
“Generalized design of sequence–ensemble–function relationships for intrinsically disordered proteins” by Ryan Krueger et al. Nature Computational Science
Abstract
General design of sequence-set function relationships for intrinsically unstructured proteins
The design of folded proteins has improved significantly in recent years. However, many proteins and protein regions are intrinsically disordered and lack stable folding. That is, the sequence of an intrinsically disordered protein (IDP) encodes a wide range of spatial conformations that determine its biological function. This plasticity and conformational heterogeneity complicate the design of IDPs.
Here we present a computational framework for the de novo design of intrinsically disordered proteins (IDPs) that rationally and efficiently reverses molecular simulations that estimate the fundamental relationship between sequence and assembly. We emphasize the versatility of this approach by designing IDPs with diverse properties and arbitrary sequence constraints.
These include intrinsically disordered proteins (IDPs) with target set sizes, loops, and connectors, highly sensitive sensors for physicochemical stimuli, and ligands for targeting inefficient substrates with different conformational preferences.
In summary, our method provides a general framework for designing sequence-set function relationships of biological macromolecules.

