A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection

Michelle Carey, Juan Camilo Ramírez, Shuang Wu, Hulin Wu

Research output: Contribution to journalArticle

Abstract

A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

LanguageEnglish
Pages1930-1955
Number of pages26
JournalStatistical Methods in Medical Research
Volume27
Issue number7
DOIs
StatePublished - Jul 1 2018

Fingerprint

Gene Regulatory Networks
Influenza
Gene Regulatory Network
Gene Expression Data
Human Influenza
Gene Expression
Infection
Statistical Modeling
Biomarkers
Dynamic Process
Phenotype
Repository
High-dimensional
Integrate
Gene
Series
Modeling
Demonstrate

Keywords

  • differential equations
  • Gene Expression Omnibus
  • gene regulatory network
  • Time-course data

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Cite this

A big data pipeline : Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection. / Carey, Michelle; Ramírez, Juan Camilo; Wu, Shuang; Wu, Hulin.

In: Statistical Methods in Medical Research, Vol. 27, No. 7, 01.07.2018, p. 1930-1955.

Research output: Contribution to journalArticle

@article{bc10d75fe96a4398a7e2d58fa5b179af,
title = "A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection",
abstract = "A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.",
keywords = "differential equations, Gene Expression Omnibus, gene regulatory network, Time-course data",
author = "Michelle Carey and Ram{\'i}rez, {Juan Camilo} and Shuang Wu and Hulin Wu",
year = "2018",
month = "7",
day = "1",
doi = "10.1177/0962280217746719",
language = "English",
volume = "27",
pages = "1930--1955",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "7",

}

TY - JOUR

T1 - A big data pipeline

T2 - Statistical Methods in Medical Research

AU - Carey, Michelle

AU - Ramírez, Juan Camilo

AU - Wu, Shuang

AU - Wu, Hulin

PY - 2018/7/1

Y1 - 2018/7/1

N2 - A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

AB - A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

KW - differential equations

KW - Gene Expression Omnibus

KW - gene regulatory network

KW - Time-course data

UR - http://www.scopus.com/inward/record.url?scp=85047948513&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047948513&partnerID=8YFLogxK

U2 - 10.1177/0962280217746719

DO - 10.1177/0962280217746719

M3 - Article

VL - 27

SP - 1930

EP - 1955

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 7

ER -