5.9
CiteScore
5.9
Impact Factor
Volume 48 Issue 3
Mar.  2021
Turn off MathJax
Article Contents

Pathway analysis for genome-wide genetic variation data: Analytic principles, latest developments, and new opportunities

doi: 10.1016/j.jgg.2021.01.007
More Information
  • Corresponding author: E-mail address: PLEE0@mgh.harvard.edu (Phil H. Lee)
  • Received Date: 2020-11-04
  • Accepted Date: 2021-01-25
  • Rev Recd Date: 2021-01-24
  • Available Online: 2021-02-26
  • Publish Date: 2021-03-20
  • Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for in-depth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years, uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold: (1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data; (2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and (3) discuss remaining challenges and future directions for further method developments.
  • loading
  • [1]
    Lango Allen, H., Estrada, K., Lettre, G., Berndt, S.I., Weedon, M.N., et al., 2010. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-838.
    [2]
    Benjamini, Y., Hochberg, Y., 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B 57, 289-300.
    [3]
    Benjamini, Y., Yekutieli, D., 2001. The Control of the False Discovery Rate in Multiple Testing under Dependency. Ann. Stat. 29, 1165-1188.
    [4]
    Braun, R., 2014. Systems analysis of high-throughput data. Adv. Exp. Med. Biol. 844, 153-187.
    [5]
    Braun, R., Buetow, K., 2011. Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet.. 7, e1002101.
    [6]
    Breen, G., Li, Q., Roth, B.L., O’Donnell, P., Didriksen, M., et al., 2016. Translating genome-wide association findings into new therapeutics for psychiatry. Nat. Neurosci. 19, 1392-1396.
    [7]
    Brown, G.R., Hem, V., Katz, K.S., Ovetsky, M., Wallin, C., et al., 2015. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 43, D36-42.
    [8]
    Buniello, A., MacArthur, J.A.L., Cerezo, M., Harris, L.W., Hayhurst, J., et al., 2019. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005-D1012.
    [9]
    Byrne, E.M., Zhu, Z., Qi, T., Skene, N.G., Bryois, J., et al., 2020. Conditional GWAS analysis to identify disorder-specific SNPs for psychiatric disorders. Mol. Psychiatry. 10.1038.
    [10]
    Cross-Disorder Group of the Psychiatric Genomics Consortium. Electronic address, plee mgh harvard edu, Cross-Disorder Group of the Psychiatric Genomics, C., 2019. Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders. Cell 179, 1469-1482.
    [11]
    de Leeuw, C.A., Mooij, J.M., Heskes, T., Posthuma, D., 2015. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219.
    [12]
    de Leeuw, C.A., Neale, B.M., Heskes, T., Posthuma, D., 2016. The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353-364.
    [13]
    Draghici, S., Khatri, P., Tarca, A.L., Amin, K., Done, A., et al., 2007. A systems biology approach for pathway level analysis. Genome Res. 17, 1537-1545.
    [14]
    Encode Project Consortium, 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74.
    [15]
    Fang, G., Wang, W., Paunic, V., Heydari, H., Costanzo, M., et al., 2019. Discovering genetic interactions bridging pathways in genome-wide association studies. Nat. Commun. 10, 4274.
    [16]
    Frankish, A., Diekhans, M., Ferreira, A.M., Johnson, R., Jungreis, I., et al., 2019. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766-D773.
    [17]
    Frei, O., Holland, D., Smeland, O.B., Shadrin, A.A., Fan, C.C., et al., 2019. Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation. Nat. Commun. 10, 2417.
    [18]
    Gamazon, E.R., Segre, A. V, van de Bunt, M., Wen, X., Xi, H.S., et al., 2018. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956-967.
    [19]
    Gerring, Z.F., Gamazon, E.R., Derks, E.M., Major Depressive Disorder Working Group of the Psychiatric Genomics, C., 2019. A gene co-expression network-based analysis of multiple brain tissues reveals novel genes and molecular pathways underlying major depression. PLoS Genet. 15, e1008245.
    [20]
    Goeman, J.J., Buhlmann, P., 2007. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980-987.
    [21]
    Gonzalez-Serna, D., Ochoa, E., Lopez-Isac, E., Julia, A., Degenhardt, F., et al., 2020. A cross-disease meta-GWAS identifies four new susceptibility loci shared between systemic sclerosis and Crohn’s disease. Sci. Rep. 10, 1862.
    [22]
    Grassmann, F., Kiel, C., Zimmermann, M.E., Gorski, M., Grassmann, V., et al., 2017. Genetic pleiotropy between age-related macular degeneration and 16 complex diseases and traits. Genome Med. 9, 29.
    [23]
    Graur, D., 2017. An Upper Limit on the Functional Fraction of the Human Genome. Genome Biol. Evol. 9, 1880-1885.
    [24]
    GTEx Consortium, 2020. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318-1330.
    [25]
    Hill, W.D., Davies, N.M., Ritchie, S.J., Skene, N.G., Bryois, J., et al., 2019. Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income. Nat. Commun. 10, 5741.
    [26]
    Holden, M., Deng, S., Wojnowski, L., Kulle, B., 2008. GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24, 2784-2785.
    [27]
    Holland, D., Frei, O., Desikan, R., Fan, C.C., Shadrin, A.A., et al., 2020. Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model. PLoS Genet. 16, e1008612.
    [28]
    Holmans, P., 2010. Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Adv. Genet. 72, 141-179.
    [29]
    Holmans, P., Green, E.K., Pahwa, J.S., Ferreira, M.A., Purcell, S.M., et al., 2009. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 85, 13-24.
    [30]
    Hong, M.G., Pawitan, Y., Magnusson, P.K., Prince, J.A., 2009. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum. Genet. 126, 289-301.
    [31]
    Howard, D.M., Adams, M.J., Clarke, T.K., Hafferty, J.D., Gibson, J., et al., 2019. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343-352.
    [32]
    Hurley, T.D., Edenberg, H.J., 2012. Genes encoding enzymes involved in ethanol metabolism. Alcohol Res. 34, 339-344.
    [33]
    Jantzen, S.G., Sutherland, B.J., Minkley, D.R., Koop, B.F., 2011. GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets. BMC Res. Notes 4, 267.
    [34]
    Jia, P., Zhao, Z., 2014. Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives. Hum. Genet. 133, 125-138.
    [35]
    Kaakinen, M., Magi, R., Fischer, K., Heikkinen, J., Jarvelin, M.R., et al., 2017. MARV: a tool for genome-wide multi-phenotype analysis of rare variants. BMC Bioinformatics 18, 110.
    [36]
    Kanehisa, M., Goto, S., 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30.
    [37]
    Kember, R.L., Hou, L., Ji, X., Andersen, L.H., Ghorai, A., et al., 2018. Genetic pleiotropy between mood disorders, metabolic, and endocrine traits in a multigenerational pedigree. Transl. Psychiatry 8, 218.
    [38]
    Khatri, P., Draghici, S., 2005. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587-3595.
    [39]
    Kirsten, H., Al-Hasani, H., Holdt, L., Gross, A., Beutner, F., et al., 2015. Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding locidagger. Hum. Mol. Genet. 24, 4746-4763.
    [40]
    Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., et al., 2005. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385-389.
    [41]
    Koopmans, F., van Nierop, P., Andres-Alonso, M., Byrnes, A., Cijsouw, T., et al., 2019. SynGO: An Evidence-Based, Expert-Curated Knowledge Base for the Synapse. Neuron 103, 217-234 e4.
    [42]
    Lee, S., Kim, S., Kim, Y., Oh, B., Hwang, H., et al., 2019. Pathway analysis of rare variants for the clustered phenotypes by using hierarchical structured components analysis. BMC Med. Genomics 12, 100.
    [43]
    Lee, P.H., O’Dushlaine, C., Thomas, B., Purcell, S.M., 2012. INRICH: interval-based enrichment analysis for genome-wide association studies. Bioinformatics 28, 1797-1799.
    [44]
    Li, Y., Calvo, S.E., Gutman, R., Liu, J.S., Mootha, V.K., 2014. Expansion of biological pathways based on evolutionary inference. Cell 158, 213-225.
    [45]
    Lin, D.Y., Sullivan, P.F., 2009. Meta-analysis of genome-wide association studies with overlapping subjects. Am. J. Hum. Genet. 85, 862-872.
    [46]
    Lips, E.S., Kooyman, M., de Leeuw, C., Posthuma, D., 2015. JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits. Genes (Basel) 6, 238-251.
    [47]
    Liu, J.Z., McRae, A.F., Nyholt, D.R., Medland, S.E., Wray, N.R., et al., 2010. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139-145.
    [48]
    Magi, R., Suleimanov, Y. V, Clarke, G.M., Kaakinen, M., Fischer, K., et al., 2017. SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes. BMC Bioinformatics 18, 25.
    [49]
    McKinney, B.A., Pajewski, N.M., 2011. Six Degrees of Epistasis: Statistical Network Models for GWAS. Front. Genet. 2, 109.
    [50]
    McKusick-Nathans Inst. Genet. Med. Johns Hopkins Univ, 2020. Online Mendelian Inheritance in Man. https://www.omim.org/.
    [51]
    Medina, I., Montaner, D., Bonifaci, N., Pujana, M.A., Carbonell, J., et al., 2009. Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res. 37, 340-344.
    [52]
    Mi, H., Muruganujan, A., Thomas, P.D., 2013. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, 377-386.
    [53]
    Molyneaux, B.J., Goff, L.A., Brettler, A.C., Chen, H.H., Hrvatin, S., et al., 2015. DeCoN: genome-wide analysis of in vivo transcriptional dynamics during pyramidal neuron fate selection in neocortex. Neuron 85, 275-288.
    [54]
    Mootha, V.K., Lindgren, C.M., Eriksson, K.F., Subramanian, A., Sihag, S., et al., 2003. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267-273.
    [55]
    Nagarathnam, B., Karpe, S.D., Harini, K., Sankar, K., Iftekhar, M., et al., 2014. DOR - a Database of Olfactory Receptors - Integrated Repository for Sequence and Secondary Structural Information of Olfactory Receptors in Selected Eukaryotic Genomes. Bioinform Biol. Insights 8, 147-158.
    [56]
    National Center for Health Statistics, 2021. International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). https://www.cdc.gov/nchs/icd/icd10cm.html .
    [57]
    National Library of Medicine, 2020. Medical Subject Headings. https://www.nlm.nih.gov/mesh/meshhome.html.
    [58]
    O’Dushlaine, C., Kenny, E., Heron, E.A., Segurado, R., Gill, M., et al., 2009. The SNP ratio test: pathway analysis of genome-wide association datasets. Bioinformatics 25, 2762-2763.
    [59]
    Paczkowska, M., Barenboim, J., Sintupisut, N., Fox, N.S., Zhu, H., et al., 2020. Integrative pathway enrichment analysis of multivariate omics data. Nat. Commun. 11, 735.
    [60]
    Pardinas, A.F., Holmans, P., Pocklington, A.J., Escott-Price, V., Ripke, S., et al., 2018. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381-389.
    [61]
    Pasman, J.A., Verweij, K.J.H., Gerring, Z., Stringer, S., Sanchez-Roige, S., et al., 2018. GWAS of lifetime cannabis use reveals new risk loci, genetic overlap with psychiatric traits, and a causal influence of schizophrenia. Nat. Neurosci. 21, 1161-1170.
    [62]
    Pedroso, I., Barnes, M.R., Lourdusamy, A., Al-Chalabi, A., Breen, G., 2015. FORGE: multivariate calculation of gene-wide p-values from Genome-Wide Association Studies. bioRxiv 23648.
    [63]
    Pedroso, I., Lourdusamy, A., Rietschel, M., Nothen, M.M., Cichon, S., et al., 2012. Common genetic variants and gene-expression changes associated with bipolar disorder are over-represented in brain signaling pathway genes. Biol. Psychiatry 72, 311-317.
    [64]
    Saevarsdottir, S., Olafsdottir, T.A., Ivarsdottir, E. V, Halldorsson, G.H., Gunnarsdottir, K., et al., 2020. FLT3 stop mutation increases FLT3 ligand level and risk of autoimmune thyroid disease. Nature 584, 619-623.
    [65]
    Schriml, L.M., Mitraka, E., Munro, J., Tauber, B., Schor, M., et al., 2019. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 47, D955-D962.
    [66]
    Schrode, N., Ho, S.M., Yamamuro, K., Dobbyn, A., Huckins, L., et al., 2019. Synergistic effects of common schizophrenia risk variants. Nat. Genet. 51, 1475-1485.
    [67]
    Segre, A. V, Consortium, D., investigators, M., Groop, L., Mootha, V.K., et al., 2010. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 6, e1001058.
    [68]
    Sey, N.Y.A., Hu, B., Mah, W., Fauni, H., McAfee, J.C., et al., 2020. A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles. Nat. Neurosci. 23, 583-593.
    [69]
    Shah, S., Henry, A., Roselli, C., Lin, H., Sveinbjornsson, G., et al., 2020. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat. Commun. 11, 163.
    [70]
    Shahpori, R., Doig, C., 2010. Systematized Nomenclature of Medicine-Clinical Terms direction and its implications on critical care. J. Crit. Care 25, 364 1-369.
    [71]
    Solovieff, N., Cotsapas, C., Lee, P.H., Purcell, S.M., Smoller, J.W., 2013. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483-495.
    [72]
    Storey, J.D., 2002. A Direct Approach to False Discovery Rates. J. R. Stat. Soc. Ser. B (Statistical Methodol.) 64, 479-498.
    [73]
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., et al., 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U S A 102, 15545-15550.
    [74]
    Sullivan, P.F., 2012. Puzzling over schizophrenia: schizophrenia as a pathway disease. Nat. Med. 18, 210-211.
    [75]
    Sun, R., Hui, S., Bader, G.D., Lin, X., Kraft, P., 2019. Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic. PLoS Genet. 15, e1007530.
    [76]
    Supek, F., Bosnjak, M., Skunca, N., Smuc, T., 2011. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6, e21800.
    [77]
    Tarca, A.L., Draghici, S., Khatri, P., Hassan, S.S., Mittal, P., et al., 2009. A novel signaling pathway impact analysis. Bioinformatics 25, 75-82.
    [78]
    The Gene Ontology, C., 2019. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330-D338.
    [79]
    Thompson, R., Johnston, L., Taruscio, D., Monaco, L., Beroud, C., et al., 2014. RD-Connect: an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. J. Gen. Intern Med. 3, 780-787.
    [80]
    Tian, D., Wang, P., Tang, B., Teng, X., Li, C., et al., 2020. GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res. 48, 927-932.
    [81]
    Vaske, C.J., Benz, S.C., Sanborn, J.Z., Earl, D., Szeto, C., et al., 2010. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, 237-245.
    [82]
    Vuckovic, D., Bao, E.L., Akbari, P., Lareau, C.A., Mousas, A., et al., 2020. The Polygenic and Monogenic Basis of Blood Traits and Diseases. Cell 182, 1214-1231.
    [83]
    Wang, G., Oh, D.H., Dassanayake, M., 2020. GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. BMC Bioinformatics 21, 139.
    [84]
    Wang, K., Li, M., Bucan, M., 2007. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81, 1278-1283.
    [85]
    Wang, L., Jia, P., Wolfinger, R.D., Chen, X., Grayson, B.L., Aune, T.M., Zhao, Z., 2011. An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. Bioinformatics 27, 686-692.
    [86]
    Wang et al., 2015aWang et al., 2015 Wang, Q., Yang, C., Gelernter, J., Zhao, H., 2015a. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Hum. Genet. 134, 1195-1209.
    [87]
    Wang et al., 2015bWang et al., 2015 Wang, Y., Liu, A., Mills, J.L., Boehnke, M., Wilson, A.F., et al., 2015b. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet. Epidemiol. 39, 259-275.
    [88]
    Wellcome Sanger Institute, 2017. Homo sapiens - Vega Genome Browser 68. http://vega.archive.ensembl.org/', 2017.
    [89]
    Werner, T., 2008. Bioinformatics applications for pathway analysis of microarray data. Curr. Opin. Biotechnol. 19, 50-54.
    [90]
    Xavier, R.J., Rioux, J.D., 2008. Genome-wide association studies: a new window into immune-mediated diseases. Nat. Rev. Immunol. 8, 631-643.
    [91]
    Yates, A.D., Achuthan, P., Akanni, W., Allen, J., Allen, J., et al., 2020. Ensembl 2020. Nucleic Acids Res. 48, 682-688.
    [92]
    Yon Rhee, S., Wood, V., Dolinski, K., Draghici, S., 2008. Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 9, 509-515.
    [93]
    Yurko, R., Roeder, K., Devlin, B., G’Sell, M., 2020. H-MAGMA, inheriting a shaky statistical foundation, yields excess false positives. bioRxiv 2020.08.20.260224.
    [94]
    Zhang, K., Cui, S., Chang, S., Zhang, L., Wang, J., 2010. i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res.. 38, W90-W95.
    [95]
    Zhang, S., Jiang, W., Ma, R.C., Yu, W., 2019. Region-based interaction detection in genome-wide case-control studies. BMC Med. Genomics 12, 133.
    [96]
    Zhu, X., Stephens, M., 2018. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 9, 4361.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (146) PDF downloads (12) Cited by ()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return