华中农业大学教师主页平台管理系统胡学海--中文主页-- A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions

胡学海

博士生导师

硕士生导师

教师姓名：胡学海

教师拼音名称：huxuehai

职务：大数据系系主任

职称：教授

学历：博士研究生毕业

学位：理学博士学位

办公地点：湖北洪山实验室C308

电子邮箱：

毕业院校：武汉大学

所属院系：信息学院

所在单位：信息学院

学科：统计学其他专业生物信息学

同专业博导同专业硕导

其他联系方式

论文成果

当前位置: 中文主页 > 科学研究 > 论文成果

A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions

发布时间：2021-04-30 点击次数：

影响因子：3.517

DOI码：10.3389/fgene.2019.01305

发表刊物：Frontiers in Genetics

关键字：deep learning, pretraining, retraining, tissue-speciﬁc enhancers, prediction

摘要：Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today ’ s biology. Enhancers are distal CREs and play signi ﬁ cant roles in gene transcriptional regulation. Although identi ﬁ cations of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which speci ﬁ c cell or tissue types, they will be activated and functional [tissue-speci ﬁ c enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a speci ﬁ c cell or tissue type only has a limited number of available enhancer samples for training. Here, we ﬁ rst adopted a reported deep learning architecture and then developed a novel training strategy named “ pretraining-retraining strategy ” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-speci ﬁ c enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a ﬁ ve-fold cross-validation. Interestingly, based on the trained pretraining model, a new ﬁ nding is that only additional twenty epochs are needed to complete the retraining process on testing 23 speci ﬁ c tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is signi ﬁ cantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.

论文类型：期刊论文

卷号：10

页面范围：1305

是否译文：否

发表时间：2020-01-01

收录刊物：SCI

上一条：A statistical framework for predicting critical regions of p53-dependent enhancers 下一条：Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences