Solving the major challenge of biology in the past 50 years

在CASP竞赛中,deepmind利用数据开发了alphafold2:数据库中超过17万个蛋白质序列和结构,以及其他大型数据库的数据。

In the CASP competition, deepmind used data to develop alphafold2: more than 170000 protein sequences and structures in the database, as well as data from other large databases.

评价蛋白质结构预测准确性的指标称为GDT,即评价预测氨基酸位置与实际氨基酸位置的差异。越不坏,分数越高。

The index to evaluate the accuracy of protein structure prediction is called GDT, which is to evaluate the difference between the predicted amino acid position and the actual amino acid position. The less bad, the higher the score.

GDT得分范围从0到100。2006年至2016年间,最高的数字在40岁左右。2018年,上一代人的字母表分数突破50分。这次,新一代alphafold在蛋白质结构预测大赛中的得分超过92.4分。

GDT scores range from 0 to 100. Between 2006 and 2016, the highest figure was around 40. In 2018, the previous generation's alphafold score broke through 50. This time, the new generation of alphafold scored more than 92.4 in the protein structure prediction contest.

更直观。将预测结构与实际结构进行比较,可以看出基本一致(下图中绿色为实验得到的实际结构,蓝色为计算预测结构)。

It's more intuitive. Comparing the predicted structure with the actual structure, it can be seen that it is basically identical (in the figure below, green is the actual structure obtained from the experiment, and the blue is the calculated and predicted structure).

注:在两年一度的蛋白质结构预测临界评估(CASP)竞赛中,alphafold击败了所有其他团队,在准确性方面与实验结果相匹配。随着预测难度的增加,alphafold的准确率保持在一个稳定的高水平,其表现远远优于其他球队和以往的比赛。

Note: alphafold beat all other teams this year in the biennial critical assessment of protein structure prediction (CASP) competition and matched the experimental results in terms of accuracy. As the difficulty of prediction increases, alphafold's accuracy rate remains at a stable high level, and its performance is far better than that of other teams and previous competitions.

然而,许多人对这一结果产生了怀疑,主要是在准确性上。首先,17万个数据应该是远远不够的,准确性值得怀疑;而且蛋白质结构折叠问题太深,如果能解决的话,deepmind很快就会获得诺贝尔奖。

However, many people have doubts about this result, mainly in the accuracy. First of all, the 170000 data should be far from enough, and the accuracy is questionable; moreover, the protein structure folding problem is too deep, if it can be solved, deepmind will soon win the Nobel Prize.

因此,在惊叹alphafold的成就的同时,我们还需要冷静地等待生物学家的实验验证。

Therefore, while marveling at alphafold's achievements, we still need to wait calmly for biologists' experimental verification.

除了要注意的问题,其实研究方法更具价值。

In addition to the problems we should pay attention to, in fact, the research methods are more valuable.

毕竟,alphafold的结构预测与标准的实验方法(如X射线晶体学或冷冻电镜)没有什么不同,但后者更费时、更昂贵。科学家说alphafold也许不能完全取代这些实验方法,但它确实提供了一种研究生物学的新方法。

After all, alphafold's structure prediction is no different from standard experimental methods such as X-ray crystallography or cryo em, but the latter is more laborious and expensive. Scientists say alphafold may not completely replace these experimental methods, but it does offer a new way to study biology.

蛋白质结构:生物学50年来的挑战

Protein structure: the challenge of biology in the past 50 years

蛋白质是生命的基础,与细胞组成密切相关。蛋白质的功能取决于它的三维结构。

Protein is the basis of life and is closely related to cell composition. The function of protein depends on its 3D structure.

生物学家一直在试验生命的奥秘,即氨基酸序列(蛋白质的组成部分)是如何画出最终形状的。

Biologists have been experimenting with the mystery of life, that is, how the amino acid sequence (the component of a protein) draws the final shape.

过去,蛋白质的结构是通过实验室知道的。例如,用X射线束照射结晶的蛋白质,并将衍射光转换成蛋白质原子坐标,以掌握蛋白质的第一个完整结构。

In the past, the structure of proteins has been known through the laboratory. For example, X-ray beams are used to irradiate crystallized proteins, and the diffraction light is converted into protein atomic coordinates to grasp the first complete structure of proteins.

除实验外,随着计算机的发展,上世纪末,这项技术已被用于预测蛋白质结构,但效果并不理想。

In addition to experiments, with the development of computers, this technique has been used to predict protein structure at the end of last century, but the effect is not satisfactory.

直到2018年alphafold出现在CASP上,科学家们才在利用计算机技术预测蛋白质结构的困境中重新找回信心和希望。

It was not until alphafold appeared in CASP in 2018 that scientists regained their confidence and hope in the dilemma of using computer technology to predict protein structure.

Alphafold的第一次迭代将深度学习应用于结构和遗传数据,以预测蛋白质中氨基酸对之间的距离。据alphafold的主要领导者之一约翰·霍普(John jumper)介绍,在第二步,虽然没有使用人工智能技术,但alphafold利用结构和遗传数据获得蛋白质的外观模型,这与之前的研究一致。

Alphafold's first iteration applies deep learning to structural and genetic data to predict distances between amino acid pairs in proteins. According to John jumper, one of the main leaders of alphafold, in the second step, although AI technology is not used, alphafold uses structural and genetic data to obtain the appearance model of protein, which is consistent with previous studies.

但是第一次迭代是有缺陷的。所以这个团队开发了一个人工智能网络。这个网络结合了决定蛋白质如何折叠的物理和几何约束的信息。他们设定了一个令人望而生畏的目标:这个网络可以预测目标蛋白质序列的最终结构,而不仅仅是氨基酸之间的关系。

But the first iteration is flawed. So the team developed an AI network. The network combines information about the physical and geometric constraints that determine how proteins fold. They set a daunting goal: the network could predict the final structure of the target protein sequence, not just the relationship between amino acids.

惊人的准确性

Amazing accuracy

CASP持续了几个月。

CASP lasted for several months.

1994年,穆尔特和他的同事发起了每两年举行一次的CASP。竞赛小组获得了大约100种结构未知的蛋白质的氨基酸序列。一些小组计算了每个序列的结构,而另一些小组则通过实验确定了它。组织者然后将计算出的预测与实验室结果进行比较,并提供预测的全球距离测试(GDT)分数。

In 1994, moult and his colleagues launched CASP, which is held every two years. The competition team obtained the amino acid sequences of about 100 proteins with unknown structures. Some groups calculated the structure of each sequence, while others determined it experimentally. The organizers then compare the calculated predictions with the laboratory results and provide the global distance test (GDT) scores for the predictions.

该小组有几周时间提交其结构预测。然后,一个独立的科学家小组使用指标来评估预测的蛋白质与实验确定的结构的相似性,以评估每个研究小组的预测。研究小组的名字是匿名的。

The team has weeks to submit its structural prediction. Then, a team of independent scientists used metrics to assess the similarity of predicted proteins with experimentally determined structures to evaluate the predictions of each research group. The name of the research group was anonymous.

在今年的比赛中,alphafold被命名为"427组"。许多预测的项目具有惊人的准确性,近三分之二的预测与实验结构相当。

In this year's competition, alphafold was named "group 427.". Many of the items predicted are of astonishing accuracy, and nearly two-thirds of the predictions are comparable to the experimental structure.

Alphafold对各种靶蛋白的GDT评分中位数为92.4。对于中等难度的蛋白质,其他团队中表现最好的通常在预测准确率上得分为75分(满分100分),而在alphafold中,得分大约为90分,中位数为87分,比下一个最佳预测高出25分。

Alphafold had a median GDT score of 92.4 for various target proteins. On moderately difficult proteins, the best performance of other teams usually scored 75 (out of 100) in prediction accuracy, while in alphafold, the score was about 90, with a median of 87, which was 25 points higher than the next best prediction.

Alphafold甚至擅长解决嵌入细胞膜的蛋白质结构,细胞膜是许多人类疾病的核心,但众所周知,用X射线晶体学很难解决。医学研究委员会分子生物学实验室的结构生物学家文基·拉玛克里希南(Venki Ramakrishnan)说,这一结果"在蛋白质折叠方面取得了显著进展"

Alphafold is even good at solving protein structures that wedge into cell membranes, which are at the heart of many human diseases, but are known to be difficult to solve with X-ray crystallography. Venki Ramakrishnan, a structural biologist at the Medical Research Council's molecular biology laboratory, said the results "make a remarkable advance in protein folding."

根据moult的预测结果,90分以上的预测结果与实验方法相当。

According to moult, the prediction results of scores above 90 are equivalent to the experimental method.

然而,alphafold并不是所有预测的完美选择。在一个由52个小重复组成的蛋白质上,当组装时它们会相互扭曲位置,alphafold的预测和实验结果有一些差异。

However, alphafold is not perfect for all predictions. On a protein composed of 52 small repeats that would twist each other's position when assembled, there were some differences between alphafold's prediction and experimental results.

负责CASP的Moult说,无法确定这是alphafold的预测错误还是实验的人工制品。

Moult, who is in charge of CASP, said it was impossible to determine whether it was the prediction error of alphafold or the artifact of the experiment.

此外,alphafold的预测与MRI确定的实验结构之间的不匹配可能是由于alphafold将原始数据转换为模型的方法需要改进。

In addition, the poor match between alphafold's prediction and the experimental structure determined by MRI may be due to the need for improvement in alphafold's method of converting raw data into models.

另一个例子是alphafold的网络试图模拟蛋白质复合物或组中的单个结构,因此与其他蛋白质的相互作用会扭曲它们的形状。

Another example is that alphafold's network attempts to model individual structures in protein complexes or groups, so interactions with other proteins distort their shape.

应用

application

Alphafold的预测有助于确定LUPAS多年来一直试图破解的细菌蛋白质的结构。

Alphafold's prediction helps to determine the structure of bacterial proteins that LUPAS has been trying to crack for years.

LUPAS的团队先前收集了原始的X射线衍射数据,但要将这些类似罗夏的图案转化为结构,你需要了解蛋白质的结构。"鲁帕斯说:"经过十年的尝试,427套模型在半小时内就给了我们结构

LUPAS's team has previously collected raw X-ray diffraction data, but to convert these Rorschach like patterns into structures, you need to understand the structure of the protein. "After ten years of trying everything, 427 sets of models gave us the structure in half an hour," LUPAS said

deepmind的联合创始人兼首席执行官demishassabis称,alphafold可能需要几天时间来预测蛋白质结构,包括对蛋白质不同区域的可靠性估计。但是alphafold将对科学家开放。

According to demis hassabis, co-founder and CEO of deepmind, alphafold may take several days to predict protein structure, including reliability estimates for different regions of the protein. But alphafold will be open to scientists.

哈萨比斯认为alphafold有望用于药物研发和蛋白质设计。

Hassabis believes that alphafold is expected to be used in drug discovery and protein design.

有了alphafold,药物设计者可以快速确定危险的新病原体(如sars-cov-2)中各种蛋白质的结构,这是找到预防疾病分子的关键一步。

With alphafold, drug designers can quickly determine the structure of various proteins in dangerous new pathogens, such as sars-cov-2, which is a key step in finding molecules to prevent disease.

加州大学伯克利分校的分子神经生物学家斯蒂芬·布罗霍恩(Stephen brohawn)说,deepmind对一种名为orf3a的蛋白质的预测结果与后来由cryo em鉴定的结果非常相似。

Stephen brohawn, a molecular neurobiologist at the University of California, Berkeley, said deepmind's prediction of a protein called orf3a turned out to be very similar to that later identified by cryo em.

alphafold的出现可能意味着获得良好的蛋白质结构不再局限于实验室,只需要低质量和易于收集的实验数据。例如,蛋白质的进化分析,因为大量的基因组数据可以转化为结构而蓬勃发展。

The appearance of alphafold may mean that obtaining good protein structure is no longer limited to the laboratory, but only low-quality and easy to collect experimental data is needed. For example, evolutionary analysis of proteins, for example, flourishes because a large number of genomic data can be transformed into structures.

科学家评论说,alphafold可以帮助人们了解生命基因组中数千种未溶解蛋白质的功能,并了解人与人之间疾病引起的遗传变异。

Scientists commented that alphafold can help people understand the function of thousands of undissolved proteins in the genome of life, and understand the genetic variation caused by diseases between people.

字母表的出现也改写了心灵深处的印象。以前,人们对心灵的理解是因为团队使用人工智能来玩游戏,比如alphago。但是现在,alphafold已经涉及到生物领域,比如蛋白质结构预测,deepmind也有了另一个声音

The appearance of alphafold also rewrites the impression of deep mind. Previously, people knew deep mind because the team used AI to play games, such as alphago. But now, alphafold is involved in biological fields, such as protein structure prediction, and deepmind also has another voice to the outside world

不仅可以玩围棋,还可以利用人工智能帮助生命科学的长远发展。

Not only can you play go, but also can use AI to help the long-term development of life science.

参考链接:

Reference link:

二、https://www.sciencemag.org/news/2020/11/game-has-changed-ai-pillumns-solving-protein-structures

2、https://www.sciencemag.org/news/2020/11/game-has-changed-ai-triumphs-solving-protein-structures

三、三https://www.newscientist.com/article/2261156-deepminds-ai-biologitor-can-decipher-secrets-of-the-machines-of-life/

3、https://www.newscientist.com/article/2261156-deepminds-ai-biologist-can-decipher-secrets-of-the-machinery-of-life/

四、https://www.deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

4、https://www.deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

五个HTTPS=/预测中心.orgcasp14/Doc/casp14.u新闻稿

Five HTTPS = / predictioncenter.org casp14 / Doc / casp14.u press press press releases

六个https:/预测中心.orgcasp14/zscores美国最终.cgi

Six https:/predictioncenter.org casp14/zscores u final.cgi

七、https://m.weibo.cn/1907380525/4577229730744076

7、https://m.weibo.cn/1907380525/4577229730744076

Link:https://new.qq.com/omn/20201201/20201201A0550G00.html

update time:2020-12-01 14:48:12

Comments

Popular posts from this blog

QQ music cooperates strategically with Robles to create an immersive audio and entertainment game "QQ music Starlight Town"

Estee Lauder, L'Oreal support, tmall, Taobao platform operation, on behalf of the operators crack survival

There is not much time left for wanghong: Zhang Dayi failed to hold the first share of wanghong, and Li Jiaqi was punished