5.9
CiteScore
5.9
Impact Factor
Volume 45 Issue 9
Sep.  2018
Turn off MathJax
Article Contents

CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways

doi: 10.1016/j.jgg.2018.08.002
More Information
  • Corresponding author: E-mail address: kongl@mail.cbi.pku.edu.cn (Lei Kong)
  • Received Date: 2018-02-26
  • Accepted Date: 2018-08-13
  • Rev Recd Date: 2018-08-11
  • Available Online: 2018-09-13
  • Publish Date: 2018-09-20
  • Gene set enrichment (GSE) analyses play an important role in the interpretation of large-scale transcriptome datasets. Multiple GSE tools can be integrated into a single method as obtaining optimal results is challenging due to the plethora of GSE tools and their discrepant performances. Several existing ensemble methods lead to different scores in sorting pathways as integrated results; furthermore, it is difficult for users to choose a single ensemble score to obtain optimal final results. Here, we develop an ensemble method using a machine learning approach called Combined Gene set analysis incorporating Prioritization and Sensitivity (CGPS) that integrates the results provided by nine prominent GSE tools into a single ensemble score (R score) to sort pathways as integrated results. Moreover, to the best of our knowledge, CGPS is the first GSE ensemble method built based on a priori knowledge of pathways and phenotypes. Compared with 10 widely used individual methods and five types of ensemble scores from two ensemble methods, we demonstrate that sorting pathways based on the R score can better prioritize relevant pathways, as established by an evaluation of 120 simulated datasets and 45 real datasets. Additionally, CGPS is applied to expression data involving the drug panobinostat, which is an anticancer treatment against multiple myeloma. The results identify cell processes associated with cancer, such as the p53 signaling pathway (hsa04115); by contrast, according to two ensemble methods (EnrichmentBrowser and EGSEA), this pathway has a rank higher than 20, which may cause users to miss the pathway in their analyses. We show that this method, which is based on a priori knowledge, can capture valuable biological information from numerous types of gene set collections, such as KEGG pathways, GO terms, Reactome, and BioCarta. CGPS is publicly available as a standalone source code at ftp://ftp.cbi.pku.edu.cn/pub/CGPS_download/cgps-1.0.0.tar.gz.
  • loading
  • [1]
    Akers, S.M., O'Leary, H.A., Minnear, F.L. et al. VE-cadherin and PECAM-1 enhance ALL migration across brain microvascular endothelial cell monolayers Exp. Hematol., 38 (2010),pp. 733-743
    [2]
    Alhamdoosh, M., Ng, M., Wilson, N.J. et al. Combining multiple tools outperforms individual methods in gene set enrichment analyses Bioinformatics, 33 (2017),pp. 414-424
    [3]
    Anguille, S., Lion, E., Willemen, Y. et al. Interferon-α in acute myeloid leukemia: an old drug revisited Leukemia, 25 (2011),p. 739
    [4]
    Atadja, P. Development of the pan-DAC inhibitor panobinostat (LBH589): successes and challenges Cancer Lett., 280 (2009),pp. 233-241
    [5]
    Barry, W.T., Nobel, A.B., Wright, F.A. Significance analysis of functional categories in gene expression studies: a structured permutation approach Bioinforma. Oxf. Engl., 21 (2005),pp. 1943-1949
    [6]
    Bayerlová, M., Jung, K., Kramer, F. et al. Comparative study on gene set and pathway topology-based enrichment methods BMC Bioinformatics, 16 (2015),p. 334
    [7]
    Bernhard, D., Skvortsov, S., Tinhofer, I. et al. Inhibition of histone deacetylase activity enhances Fas receptor-mediated apoptosis in leukemic lymphoblasts Cell Death Differ., 8 (2001),p. 1014
    [8]
    Bolden, J.E., Peart, M.J., Johnstone, R.W. Anticancer activities of histone deacetylase inhibitors Nat. Rev. Drug Discov., 5 (2006),pp. 769-784
    [9]
    Buchwald, M., Krämer, O.H., Heinzel, T. HDACi--targets beyond chromatin Cancer Lett., 280 (2009),pp. 160-167
    [10]
    Chiaretti, S., Li, X., Gentleman, R. et al. Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival Blood, 103 (2004),pp. 2771-2778
    [11]
    Chiron, D., Bekeredjian-Ding, I., Pellat-Deceunynck, C. et al. Toll-like receptors: lessons to learn from normal and malignant human B cells Blood, 112 (2008),pp. 2205-2213
    [12]
    Croft, D., O'Kelly, G., Wu, G. et al. Reactome: a database of reactions, pathways and biological processes Nucleic Acids Res., 39 (2011),pp. D691-D697
    [13]
    Desouza, M., Gunning, P.W., Stehn, J.R. The actin cytoskeleton as a sensor and mediator of apoptosis BioArchitecture, 2 (2012),pp. 75-87
    [14]
    Dong, X., Hao, Y., Wang, X. et al. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights Sci. Rep., 6 (2016)
    [15]
    Edgar, R., Domrachev, M., Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res., 30 (2002),pp. 207-210
    [16]
    Efron, B., Tibshirani, R. On testing the significance of sets of genes Ann. Appl. Stat., 1 (2007),pp. 107-129
    [17]
    Fang, R., Xiao, T., Fang, Z. et al. MicroRNA-143 (miR-143) regulates cancer glycolysis via targeting hexokinase 2 gene J. Biol. Chem., 287 (2012),pp. 23227-23235
    [18]
    Fang, Z., Tian, W., Ji, H. A network-based gene-weighting approach for pathway analysis Cell Res., 22 (2012),pp. 565-580
    [19]
    Fogg, P.C.M., O'Neill, J.S., Dobrzycki, T. et al. Class IIa histone deacetylases are conserved regulators of circadian function J. Biol. Chem., 289 (2014),pp. 34341-34348
    [20]
    Fumarola, C., Bonelli, M.A., Petronini, P.G. et al. Targeting PI3K/AKT/mTOR pathway in non small cell lung cancer Biochem. Pharmacol., 90 (2014),pp. 197-207
    [21]
    Gaarenstroom, T., Hill, C.S. TGF-β signaling to chromatin: how Smads regulate transcription during self-renewal and differentiation Semin. Cell Dev. Biol., 32 (2014),pp. 107-118
    [22]
    Geistlinger, L., Csaba, G., Zimmer, R. Bioconductor's EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis BMC Bioinformatics, 17 (2016),p. 45
    [23]
    Glenisson, W., Castronovo, V., Waltregny, D. Histone deacetylase 4 is required for TGFβ1-induced myofibroblastic differentiation Biochim. Biophys. Acta BBA - Mol. Cell Res., 1773 (2007),pp. 1572-1582
    [24]
    Goeman, J.J., Bühlmann, P. Analyzing gene expression data in terms of gene sets: methodological issues Bioinforma. Oxf. Engl., 23 (2007),pp. 980-987
    [25]
    Goeman, J.J., van de Geer, S.A., de Kort, F. et al. A global test for groups of genes: testing association with a clinical outcome Bioinforma. Oxf. Engl., 20 (2004),pp. 93-99
    [26]
    Gu, Z., Wang, J. CePa: an R package for finding significant pathways weighted by multiple network centralities Bioinforma. Oxf. Engl., 29 (2013),pp. 658-660
    [27]
    Gumy-Pause, F., Wacker, P., Sappino, A.-P. Leukemia, 18 (2004),p. 238
    [28]
    Hänzelmann, S., Castelo, R., Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data BMC Bioinformatics, 14 (2013),p. 7
    [29]
    Kanehisa, M., Goto, S. KEGG: kyoto encyclopedia of genes and genomes Nucleic Acids Res., 28 (2000),pp. 27-30
    [30]
    Kanehisa, M., Goto, S., Furumichi, M. et al. KEGG for representation and analysis of molecular networks involving diseases and drugs Nucleic Acids Res., 38 (2010),pp. D355-D360
    [31]
    Khatri, P., Sirota, M., Butte, A.J. Ten years of pathway analysis: current approaches and outstanding challenges PLoS Comput. Biol., 8 (2012)
    [32]
    Law, C.W., Chen, Y., Shi, W. et al. voom: precision weights unlock linear model analysis tools for RNA-seq read counts Genome Biol., 15 (2014),p. R29
    [33]
    Liu, N., He, S., Ma, L. et al. Blocking the class I histone deacetylase ameliorates renal fibrosis and inhibits renal fibroblast activation via modulating TGF-beta and EGFR signaling PLoS One, 8 (2013)
    [34]
    Livrea, P., Trojano, M., Simone, I.L. et al. Acute changes in blood-CSF barrier permselectivity to serum proteins after intrathecal methotrexate and CNS irradiation J. Neurol., 231 (1985),pp. 336-339
    [35]
    Luciano, R.L., Brewster, U.C. Kidney involvement in leukemia and lymphoma Adv. Chron. Kidney Dis., 21 (2014),pp. 27-35
    [36]
    Luo, W., Friedman, M.S., Shedden, K. et al. GAGE: generally applicable gene set enrichment for pathway analysis BMC Bioinformatics, 10 (2009),p. 161
    [37]
    Mayerhofer, M., Florian, S., Krauth, M.-T. et al. Identification of heme oxygenase-1 as a novel BCR/ABL-dependent survival factor in chronic myeloid leukemia Cancer Res., 64 (2004),pp. 3148-3154
    [38]
    Parkinson, H., Kapushesky, M., Shojatalab, M. et al. ArrayExpress--a public database of microarray experiments and gene expression profiles Nucleic Acids Res., 35 (2007),pp. D747-D750
    [39]
    Patel, N., Krishnan, S., Offman, M.N. et al. A dyad of lymphoblastic lysosomal cysteine proteases degrades the antileukemic drug l-asparaginase J. Clin. Invest., 119 (2009),pp. 1964-1973
    [40]
    Pitt, L.A., Tikhonova, A.N., Hu, H. et al. CXCL12-producing vascular endothelial niches control acute T cell leukemia maintenance Cancer Cell, 27 (2015),pp. 755-768
    [41]
    Rahmatallah, Y., Emmert-Streib, F., Glazko, G. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline Briefings Bioinf., 17 (2016),pp. 393-407
    [42]
    Ranganathan, P., Mohamed, R., Jayakumar, C. et al. Guidance cue Netrin-1 and the regulation of inflammation in acute and chronic kidney disease Mediat. Inflamm., 2014 (2014)
    [43]
    Rasheed, W., Bishton, M., Johnstone, R.W. et al. Histone deacetylase inhibitors in lymphoma and solid malignancies Expert Rev. Anticancer Ther., 8 (2008),pp. 413-432
    [44]
    Robinson, M.D., McCarthy, D.J., Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data Bioinformatics, 26 (2010),pp. 139-140
    [45]
    Schaefer, C.F., Anthony, K., Krupa, S. et al. PID: the pathway interaction database Nucleic Acids Res., 37 (2009),pp. D674-D679
    [46]
    Siegel, P.M., Massagué, J. Cytostatic and apoptotic actions of TGF-β in homeostasis and cancer Nat. Rev. Cancer, 3 (2003),pp. 807-820
    [47]
    Smyth, G.K.
    [48]
    Staal, F.J.T., Langerak, A.W. Signaling pathways involved in the development of T-cell acute lymphoblastic leukemia Haematologica, 93 (2008),pp. 493-497
    [49]
    Subramanian, A., Tamayo, P., Mootha, V.K. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles Proc. Natl. Acad. Sci. U. S. A., 102 (2005),pp. 15545-15550
    [50]
    Takahashi, Y., Ikezumi, Y., Saitoh, A. Rituximab protects podocytes and exerts anti-proteinuric effects in rat adriamycin-induced nephropathy independent of B-lymphocytes Nephrol. Carlton Vic., 22 (2017),pp. 49-57
    [51]
    Tarca, A.L., Bhatti, G., Romero, R. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity PLoS One, 8 (2013)
    [52]
    Tarca, A.L., Draghici, S., Bhatti, G. et al. Down-weighting overlapping genes improves gene set analysis BMC Bioinformatics, 13 (2012),p. 136
    [53]
    The Cancer Genome Atlas Research Network, Weinstein, J.N., Collisson, E.A., Mills, G.B. et al. The cancer genome Atlas Pan-cancer analysis project Nat. Genet., 45 (2013),pp. 1113-1120
    [54]
    Tomfohr, J., Lu, J., Kepler, T.B. Pathway level analysis of gene expression using singular value decomposition BMC Bioinformatics, 6 (2005),p. 225
    [55]
    Tripathi, S., Emmert-Streib, F. Assessment method for a power analysis to identify differentially expressed pathways PLoS One, 7 (2012)
    [56]
    Van de Wetering, M., de Lau, W., Clevers, H. WNT signaling and lymphocyte development Cell, 109 (2002),pp. S13-S19
    [57]
    Visani, G., Martinelli, G., Piccaluga, P. et al. Alpha-interferon improves survival and remission duration in P-190BCR-ABL positive adult acute lymphoblastic leukemia Leukemia, 14 (2000),p. 22
    [58]
    Wahaib, K., Beggs, A.E., Campbell, H. et al. Panobinostat: a histone deacetylase inhibitor for the treatment of relapsed or refractory multiple myeloma Am. J. Health-Syst. Pharm. AJHP Off. J. Am. Soc. Health-Syst. Pharm., 73 (2016),pp. 441-450
    [59]
    Yetgin, S., Olgar, S., Aras, T. et al. Evaluation of kidney damage in patients with acute lymphoblastic leukemia in long-term follow-up: value of renal scan Am. J. Hematol., 77 (2004),pp. 132-139
    [60]
    Zhang, J.D., Wiemann, S. KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor Bioinformatics, 25 (2009),pp. 1470-1471
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (165) PDF downloads (11) Cited by ()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return