胡学海
影响因子:3.517
DOI码:10.3389/fgene.2019.01305
发表刊物:Frontiers in Genetics
关键字:deep learning, pretraining, retraining, tissue-specific enhancers, prediction
摘要:Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today ’ s biology. Enhancers are distal CREs and play signi fi cant roles in gene transcriptional regulation. Although identi fi cations of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which speci fi c cell or tissue types, they will be activated and functional [tissue-speci fi c enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a speci fi c cell or tissue type only has a limited number of available enhancer samples for training. Here, we fi rst adopted a reported deep learning architecture and then developed a novel training strategy named “ pretraining-retraining strategy ” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-speci fi c enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a fi ve-fold cross-validation. Interestingly, based on the trained pretraining model, a new fi nding is that only additional twenty epochs are needed to complete the retraining process on testing 23 speci fi c tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is signi fi cantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.
论文类型:期刊论文
卷号:10
页面范围:1305
是否译文:否
发表时间:2020-01-01
收录刊物:SCI