访问量:   最后更新时间:--

胡学海

博士生导师
硕士生导师
教师姓名:胡学海
教师拼音名称:huxuehai
职务:大数据系系主任
职称:教授
学历:博士研究生毕业
学位:理学博士学位
办公地点:逸夫楼C609
电子邮箱:
毕业院校:武汉大学
所属院系:信息学院
所在单位:信息学院
学科:统计学其他专业    生物信息学    
其他联系方式
论文成果
A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
发布时间:2021-04-30    点击次数:

影响因子:3.517

DOI码:10.3389/fgene.2019.01305

发表刊物:Frontiers in Genetics

关键字:deep learning, pretraining, retraining, tissue-specific enhancers, prediction

摘要:Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today ’ s biology. Enhancers are distal CREs and play signi fi cant roles in gene transcriptional regulation. Although identi fi cations of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which speci fi c cell or tissue types, they will be activated and functional [tissue-speci fi c enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a speci fi c cell or tissue type only has a limited number of available enhancer samples for training. Here, we fi rst adopted a reported deep learning architecture and then developed a novel training strategy named “ pretraining-retraining strategy ” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-speci fi c enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a fi ve-fold cross-validation. Interestingly, based on the trained pretraining model, a new fi nding is that only additional twenty epochs are needed to complete the retraining process on testing 23 speci fi c tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is signi fi cantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.

论文类型:期刊论文

卷号:10

页面范围:1305

是否译文:

发表时间:2020-01-01

收录刊物:SCI