Solving the major challenge of biology in the past 50 years


In the CASP competition, deepmind used data to develop alphafold2: more than 170000 protein sequences and structures in the database, as well as data from other large databases.


The index to evaluate the accuracy of protein structure prediction is called GDT, which is to evaluate the difference between the predicted amino acid position and the actual amino acid position. The less bad, the higher the score.


GDT scores range from 0 to 100. Between 2006 and 2016, the highest figure was around 40. In 2018, the previous generation's alphafold score broke through 50. This time, the new generation of alphafold scored more than 92.4 in the protein structure prediction contest.


It's more intuitive. Comparing the predicted structure with the actual structure, it can be seen that it is basically identical (in the figure below, green is the actual structure obtained from the experiment, and the blue is the calculated and predicted structure).


Note: alphafold beat all other teams this year in the biennial critical assessment of protein structure prediction (CASP) competition and matched the experimental results in terms of accuracy. As the difficulty of prediction increases, alphafold's accuracy rate remains at a stable high level, and its performance is far better than that of other teams and previous competitions.


However, many people have doubts about this result, mainly in the accuracy. First of all, the 170000 data should be far from enough, and the accuracy is questionable; moreover, the protein structure folding problem is too deep, if it can be solved, deepmind will soon win the Nobel Prize.


Therefore, while marveling at alphafold's achievements, we still need to wait calmly for biologists' experimental verification.


In addition to the problems we should pay attention to, in fact, the research methods are more valuable.


After all, alphafold's structure prediction is no different from standard experimental methods such as X-ray crystallography or cryo em, but the latter is more laborious and expensive. Scientists say alphafold may not completely replace these experimental methods, but it does offer a new way to study biology.


Protein structure: the challenge of biology in the past 50 years


Protein is the basis of life and is closely related to cell composition. The function of protein depends on its 3D structure.


Biologists have been experimenting with the mystery of life, that is, how the amino acid sequence (the component of a protein) draws the final shape.


In the past, the structure of proteins has been known through the laboratory. For example, X-ray beams are used to irradiate crystallized proteins, and the diffraction light is converted into protein atomic coordinates to grasp the first complete structure of proteins.


In addition to experiments, with the development of computers, this technique has been used to predict protein structure at the end of last century, but the effect is not satisfactory.


It was not until alphafold appeared in CASP in 2018 that scientists regained their confidence and hope in the dilemma of using computer technology to predict protein structure.

Alphafold的第一次迭代将深度学习应用于结构和遗传数据,以预测蛋白质中氨基酸对之间的距离。据alphafold的主要领导者之一约翰·霍普(John jumper)介绍,在第二步,虽然没有使用人工智能技术,但alphafold利用结构和遗传数据获得蛋白质的外观模型,这与之前的研究一致。

Alphafold's first iteration applies deep learning to structural and genetic data to predict distances between amino acid pairs in proteins. According to John jumper, one of the main leaders of alphafold, in the second step, although AI technology is not used, alphafold uses structural and genetic data to obtain the appearance model of protein, which is consistent with previous studies.


But the first iteration is flawed. So the team developed an AI network. The network combines information about the physical and geometric constraints that determine how proteins fold. They set a daunting goal: the network could predict the final structure of the target protein sequence, not just the relationship between amino acids.


Amazing accuracy


CASP lasted for several months.


In 1994, moult and his colleagues launched CASP, which is held every two years. The competition team obtained the amino acid sequences of about 100 proteins with unknown structures. Some groups calculated the structure of each sequence, while others determined it experimentally. The organizers then compare the calculated predictions with the laboratory results and provide the global distance test (GDT) scores for the predictions.


The team has weeks to submit its structural prediction. Then, a team of independent scientists used metrics to assess the similarity of predicted proteins with experimentally determined structures to evaluate the predictions of each research group. The name of the research group was anonymous.


In this year's competition, alphafold was named "group 427.". Many of the items predicted are of astonishing accuracy, and nearly two-thirds of the predictions are comparable to the experimental structure.


Alphafold had a median GDT score of 92.4 for various target proteins. On moderately difficult proteins, the best performance of other teams usually scored 75 (out of 100) in prediction accuracy, while in alphafold, the score was about 90, with a median of 87, which was 25 points higher than the next best prediction.

Alphafold甚至擅长解决嵌入细胞膜的蛋白质结构,细胞膜是许多人类疾病的核心,但众所周知,用X射线晶体学很难解决。医学研究委员会分子生物学实验室的结构生物学家文基·拉玛克里希南(Venki Ramakrishnan)说,这一结果"在蛋白质折叠方面取得了显著进展"

Alphafold is even good at solving protein structures that wedge into cell membranes, which are at the heart of many human diseases, but are known to be difficult to solve with X-ray crystallography. Venki Ramakrishnan, a structural biologist at the Medical Research Council's molecular biology laboratory, said the results "make a remarkable advance in protein folding."


According to moult, the prediction results of scores above 90 are equivalent to the experimental method.


However, alphafold is not perfect for all predictions. On a protein composed of 52 small repeats that would twist each other's position when assembled, there were some differences between alphafold's prediction and experimental results.


Moult, who is in charge of CASP, said it was impossible to determine whether it was the prediction error of alphafold or the artifact of the experiment.


In addition, the poor match between alphafold's prediction and the experimental structure determined by MRI may be due to the need for improvement in alphafold's method of converting raw data into models.


Another example is that alphafold's network attempts to model individual structures in protein complexes or groups, so interactions with other proteins distort their shape.




Alphafold's prediction helps to determine the structure of bacterial proteins that LUPAS has been trying to crack for years.


LUPAS's team has previously collected raw X-ray diffraction data, but to convert these Rorschach like patterns into structures, you need to understand the structure of the protein. "After ten years of trying everything, 427 sets of models gave us the structure in half an hour," LUPAS said


According to demis hassabis, co-founder and CEO of deepmind, alphafold may take several days to predict protein structure, including reliability estimates for different regions of the protein. But alphafold will be open to scientists.


Hassabis believes that alphafold is expected to be used in drug discovery and protein design.


With alphafold, drug designers can quickly determine the structure of various proteins in dangerous new pathogens, such as sars-cov-2, which is a key step in finding molecules to prevent disease.

加州大学伯克利分校的分子神经生物学家斯蒂芬·布罗霍恩(Stephen brohawn)说,deepmind对一种名为orf3a的蛋白质的预测结果与后来由cryo em鉴定的结果非常相似。

Stephen brohawn, a molecular neurobiologist at the University of California, Berkeley, said deepmind's prediction of a protein called orf3a turned out to be very similar to that later identified by cryo em.


The appearance of alphafold may mean that obtaining good protein structure is no longer limited to the laboratory, but only low-quality and easy to collect experimental data is needed. For example, evolutionary analysis of proteins, for example, flourishes because a large number of genomic data can be transformed into structures.


Scientists commented that alphafold can help people understand the function of thousands of undissolved proteins in the genome of life, and understand the genetic variation caused by diseases between people.


The appearance of alphafold also rewrites the impression of deep mind. Previously, people knew deep mind because the team used AI to play games, such as alphago. But now, alphafold is involved in biological fields, such as protein structure prediction, and deepmind also has another voice to the outside world


Not only can you play go, but also can use AI to help the long-term development of life science.


Reference link:








Five HTTPS = / casp14 / Doc / casp14.u press press press releases


Six https:/ casp14/zscores u final.cgi




update time:2020-12-01 14:48:12


Popular posts from this blog

Miners kill red eyes! Apple M1 MAC is cracked: it can dig money

Prism LYFT early investors comment didi: autonomous driving business is a strategic choice different from Uber

LETV mobile phone is really back. What about Jia Yueting?