Click:   The Last Update Time:--

胡学海

Supervisor of Doctorate Candidates
Supervisor of Master's Candidates
Name (Simplified Chinese):胡学海
Name (Pinyin):huxuehai
Administrative Position:大数据系系主任
Professional Title:Professor
Education Level:With Certificate of Graduation for Doctorate Study
Degree:Doctoral Degree in Science
Business Address:逸夫楼C609
E-Mail:
Alma Mater:武汉大学
Teacher College:College of Informatics
School/Department:信息学院
Discipline:Other specialties in Statistics    bioinformatics    
Other Contact Information:

Email:

Paper Publications
A directed learning strategy integrating multiple omic data improves genomic prediction
Release time:2021-04-30    Hits:

DOI number:10.1111/pbi.13117

Journal:Plant Biotechnology Journal

Key Words:directed learning, genetic features, genomic prediction, LASSO, multiple omic data

Abstract:Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome-wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait-related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.

Indexed by:Journal paper

Discipline:Natural Science

First-Level Discipline:Biology

Volume:17

Page Number:2011–2020

Translation or Not:no

Date of Publication:2019-01-01

Included Journals:SCI