Click:   The Last Update Time:--

胡学海

Supervisor of Doctorate Candidates
Supervisor of Master's Candidates
Name (Simplified Chinese):胡学海
Name (Pinyin):huxuehai
Administrative Position:大数据系系主任
Professional Title:Professor
Education Level:With Certificate of Graduation for Doctorate Study
Degree:Doctoral Degree in Science
Business Address:逸夫楼C609
E-Mail:
Alma Mater:武汉大学
Teacher College:College of Informatics
School/Department:信息学院
Discipline:Other specialties in Statistics    bioinformatics    
Other Contact Information:

Email:

Paper Publications
A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
Release time:2021-04-30    Hits:

Impact Factor:3.517

DOI number:10.3389/fgene.2019.01305

Journal:Frontiers in Genetics

Key Words:deep learning, pretraining, retraining, tissue-specific enhancers, prediction

Abstract:Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today ’ s biology. Enhancers are distal CREs and play signi fi cant roles in gene transcriptional regulation. Although identi fi cations of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which speci fi c cell or tissue types, they will be activated and functional [tissue-speci fi c enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a speci fi c cell or tissue type only has a limited number of available enhancer samples for training. Here, we fi rst adopted a reported deep learning architecture and then developed a novel training strategy named “ pretraining-retraining strategy ” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-speci fi c enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a fi ve-fold cross-validation. Interestingly, based on the trained pretraining model, a new fi nding is that only additional twenty epochs are needed to complete the retraining process on testing 23 speci fi c tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is signi fi cantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.

Indexed by:Journal paper

Volume:10

Page Number:1305

Translation or Not:no

Date of Publication:2020-01-01

Included Journals:SCI