华中农业大学教师主页平台管理系统 huxuehai--Home-- A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions

胡学海

Supervisor of Doctorate Candidates

Supervisor of Master's Candidates

Name (Simplified Chinese):胡学海

Name (Pinyin):huxuehai

Administrative Position:大数据系系主任

Professional Title:Professor

Education Level:With Certificate of Graduation for Doctorate Study

Degree:Doctoral Degree in Science

Business Address:湖北洪山实验室C308

E-Mail:

Alma Mater:武汉大学

Teacher College:College of Informatics

School/Department:信息学院

Discipline:Other specialties in Statistics bioinformatics

Other Contact Information:

Email：

Paper Publications

Current position: Home > Scientific Research > Paper Publications

A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions

Release time:2021-04-30 Hits:

Impact Factor:3.517

DOI number:10.3389/fgene.2019.01305

Journal:Frontiers in Genetics

Key Words:deep learning, pretraining, retraining, tissue-speciﬁc enhancers, prediction

Abstract:Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today ’ s biology. Enhancers are distal CREs and play signi ﬁ cant roles in gene transcriptional regulation. Although identi ﬁ cations of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which speci ﬁ c cell or tissue types, they will be activated and functional [tissue-speci ﬁ c enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a speci ﬁ c cell or tissue type only has a limited number of available enhancer samples for training. Here, we ﬁ rst adopted a reported deep learning architecture and then developed a novel training strategy named “ pretraining-retraining strategy ” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-speci ﬁ c enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a ﬁ ve-fold cross-validation. Interestingly, based on the trained pretraining model, a new ﬁ nding is that only additional twenty epochs are needed to complete the retraining process on testing 23 speci ﬁ c tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is signi ﬁ cantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.

Indexed by:Journal paper

Volume:10

Page Number:1305

Translation or Not:no

Date of Publication:2020-01-01

Included Journals:SCI

Pre One:A statistical framework for predicting critical regions of p53-dependent enhancers Next One:Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences