引用本文: | 李因,谭英.基于分子理化性质特征的小样本G蛋白偶联受体靶点结合活性预测的深度学习模型[J].中国现代应用药学,2022,39(21):2842-2849. |
| LI Yin,TAN Ying.Binding Activity Prediction of the Low-data G-protein Coupled Receptors Targets by Deep Learning of Knowledge-based Molecular Representations[J].Chin J Mod Appl Pharm(中国现代应用药学),2022,39(21):2842-2849. |
|
摘要: |
目的 使用MolMapNet构建深度学习(deep learning,DL)模型,预测化合物对23个小样本(已知活性数据<250)G蛋白偶联受体(G-protein coupled receptors,GPCRs)的结合活性,辅助发现GPCRs的新型药物。方法 从多个数据库搜集小样本GPCRs的活性数据集并进行预处理,使用MolMapNet构建DL模型;将建立的模型与已公布DL模型和ML模型进行比较;用神经肽S受体专利化合物对构建的模型进行评估。结果 构建了23个小样本GPCRs靶点的单回归模型,在10折交叉验证测试下,构建的模型在测试集上的均方根误差为0.373 6~1.199 8(其中20个<1),平均绝对误差为0.299 4~1.008 3(其中21个<1),R2为0.136 9~0.810 7(其中15个>0.5,9个>0.6);与已发表的大样本GPCRs(已知活性数据>250) DL模型和小样本靶点的ML模型相比,显示出相当的性能;使用构建的模型对专利中化合物进行活性预测,模型表现良好。结论 构建的23个回归模型能够预测化合物对特定靶点的生物活性,具有筛选结构新颖的药物的潜力,MolMapNet可用于小样本GPCRs的活性预测。 |
关键词: 结合活性 深度学习 GPCR 小样本 |
DOI:10.13748/j.cnki.issn1007-7693.2022.21.021 |
分类号:R914.2 |
基金项目:国家重点研究计划合成生物学专项(2019YFA0905901) |
|
Binding Activity Prediction of the Low-data G-protein Coupled Receptors Targets by Deep Learning of Knowledge-based Molecular Representations |
LI Yin, TAN Ying
|
Tsinghua Shenzhen International Graduate School, Shenzhen 518055, China
|
Abstract: |
OBJECTIVE To construct new deep learning(DL) models for binding activity prediction against each of 23 low-data G-protein coupled receptors(GPCRs)(known binders <250) using MolMapNet, assisting in the novel drug discovery of GPCRs. METHODS Binding activity datasets of low-data GPCRs were collected from multiple databases and preprocessed, and DL models were constructed by MolMapNet; the established models were compared with published DL models and ML models; Neuropeptide S receptor proprietary compounds to evaluate the constructed model. RESULTS Under 10-fold cross-validation tests, MolMapNet DL models predicted the binding activity values of the test-set compounds for each GPCR with RMSE 0.373 6-1.199 8(20 among which RMSE<1), MAE 0.299 4-1.008 3(21 among which MAE<1), and R2 0.136 9-0.810 7(15 among which R2 >0.5, 9 among which R2 >0.6). Our low-sample models showed comparable performances to those of the published DL models trained with higher-data GPCRs(>250 known binders). Our models also performed well in activity prediction of patented GPCR binders. CONCLUSION The 23 models constructed here can predict the biological activity of a compound against a specific target with good performance, have the potential to screen drugs with novel structures, and MolMapNet architecture is useful for activity prediction against the low-sample GPCR targets. |
Key words: binding activity deep learning G-protein coupled receptors low-data |