Just as ChatGPT generates text by predicting the most likely words in a sequence, new artificial intelligence (AI) models can create new proteins from scratch that don’t occur in nature.
Using their new model, ESM3, the scientists created a new fluorescent protein that shares just 58% of its sequence with naturally occurring fluorescent proteins, they said in a study published in a preprint on July 2. bioRxiv databaseA representative from EvolutionaryScale, a company founded by former meta-researchers, also provided more details on June 25th. statement.
The research team: A smaller version of the model The company plans to make larger versions of the model available to commercial researchers under a non-commercial license, and EvolutionaryScale says the technology could be useful in a variety of fields, from discovering new drugs to designing new chemicals to break down plastics.
ESM3 is a large-scale language model (LLM) similar to OpenAI’s GPT-4, which powers the ChatGPT chatbot, and the scientists trained the largest version on 2.78 billion proteins. For each protein, they extracted information about its sequence (the order of the amino acid building blocks that make up the protein), structure (the protein’s folded three-dimensional shape), and function (what the protein does). They randomly masked some of this information about the protein and asked ESM3 to predict the missing pieces.
They scaled up this model from work the same team had done while at Meta. In 2022, they EMSFold Launched — the precursor to ESM3, which predicted the structure of unknown microbial proteins. DeepMind Also Predicted protein structure 200 million proteins.
The scientists then The predictive limitations of these AI models The protein predictions then need to be validated. But this method can significantly speed up protein structure exploration because the alternative is to map protein structures piece by piece using X-rays, which is time-consuming and costly.
ESM3 goes beyond predicting existing proteins: using information gleaned from 771 billion unique pieces of information on structure, function, and sequence, the model can generate new proteins with specific functions, which has been described as a ChatGPT moment in biology. One of the supporters of EvolutionaryScale.
In the new study, the researchers asked the models to produce a novel fluorescent protein, a type of protein that captures light and releases it at longer wavelengths, making it glow a new green color. These proteins are important for biology researchers to attach to molecules they study to track and image them, and their discovery and development is the latest step in the development of fluorescent proteins, a type of protein that can be used to detect and detect light. Nobel Prize in Chemistry In 2008.
The model generated 96 proteins with sequences and structures that could potentially fluoresce. The researchers then selected the protein that shared the fewest sequences with naturally occurring fluorescent proteins. This protein was 50 times brighter than natural green fluorescent protein, but ESM3 generated another iteration that led to a new sequence that increased the brightness. The result was a green fluorescent protein that does not exist in nature, called “esmGPF.” The EvolutionaryScale team estimated that it took 500 million years of evolution to achieve these iterations, which are executed instantly by the AI.
“At this point, we still lack a fundamental understanding of how proteins, especially ‘new to science’ proteins, behave when introduced into living systems, but this is an exciting new step that allows us to approach synthetic biology in a new way. AI modeling like ESM3 will enable the discovery of new proteins that would never be possible under the constraints of natural selection, resulting in innovations in protein engineering that would not be possible through evolution. This is exciting. But the claim to simulate 500 million years of evolution focuses only on individual proteins and does not take into account the many stages of natural selection that create the diversity of life we know today. While AI-driven protein engineering is intriguing, I can’t help but feel that it is overconfident to assume that we can outperform a complex process honed by millions of years of natural selection.”