8.2
CiteScore
6.6
Impact Factor
Volume 51 Issue 6
Jun.  2024
Turn off MathJax
Article Contents

RNAirport: a deep neural network-based database characterizing representative gene models in plants

doi: 10.1016/j.jgg.2024.03.004
Funds:

D Program of China (2023ZD04073), the Major Project of Hubei Hongshan Laboratory (2022hszd016), the Key Research and Development Program of Hubei Province (2022BFE003), and the National Natural Science Foundation of China (32070284) to G. Xu.

This study was supported by grants from the National Key R&

  • Received Date: 2024-03-03
  • Accepted Date: 2024-03-16
  • Rev Recd Date: 2024-03-15
  • Available Online: 2025-06-06
  • Publish Date: 2024-03-20
  • A 5′-leader, known initially as the 5′-untranslated region, contains multiple isoforms due to alternative splicing (aS) and alternative transcription start site (aTSS). Therefore, a representative 5′-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency. Here, we develop a ranking algorithm and a deep-learning model to annotate representative 5′-leaders for five plant species. We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal–Wallis test-based algorithm and identify the representative aS-5′-leader. To further assign a representative 5′-end, we train the deep-learning model 5′leaderP to learn aTSS-mediated 5′-end distribution patterns from cap-analysis gene expression data. The model accurately predicts the 5′-end, confirmed experimentally in Arabidopsis and rice. The representative 5′-leader-contained gene models and 5′leaderP can be accessed at RNAirport (http://www.rnairport.com/leader5P/). The Stage 1 annotation of 5′-leader records 5′-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation, identical to the project recently initiated by human GENCODE.
  • loading
  • Aitken, C.E.,Lorsch, J.R., 2012. A mechanistic overview of translation initiation in eukaryotes. Nat. Struct. Mol. Biol. 19, 568-576.
    Archer, S.K., Shirokikh, N.E., Beilharz, T.H.,Preiss, T., 2016. Dynamics of ribosome scanning and recycling revealed by translation complex profiling. Nature 535, 570-574.
    Bogard, N., Linder, J., Rosenberg, A.B.,Seelig, G., 2019. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91-106.e123.
    Browning, K.S.,Bailey-Serres, J., 2015. Mechanism of cytoplasmic mrna translation. Arabidopsis Book 13, e0176.
    Castellano, M.M.,Merchante, C., 2021. Peculiarities of the regulation of translation initiation in plants. Curr. Opin. Plant Biol. 63, 102073.
    Cheng, C.Y., Krishnakumar, V., Chan, A.P., Thibaud-Nissen, F., Schobel, S.,Town, C.D., 2017. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789-804.
    de Boer, C.G., van Bakel, H., Tsui, K., Li, J., Morris, Q.D., Nislow, C., Greenblatt, J.F.,Hughes, T.R., 2014. A unified model for yeast transcript definition. Genome Res. 24, 154-166.
    de Medeiros Oliveira, M., Bonadio, I., Lie de Melo, A., Mendes Souza, G.,Durham, A.M., 2021. Tssfinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes. Brief. Bioinform. 22.
    Devlin, J., Chang, M.W., Lee, K.,Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Eraslan, G., Avsec, Z., Gagneur, J.,Theis, F.J., 2019. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389-403.
    Forrest, A.R., Kawaji, H., Rehli, M., Baillie, J.K., de Hoon, M.J., Haberle, V., Lassmann, T., Kulakovskiy, I.V., Lizio, M., Itoh, M., et al., 2014. A promoter-level mammalian expression atlas. Nature 507, 462-470.
    Ge, S.X., Jung, D.,Yao, R., 2020. Shinygo: a graphical gene-set enrichment tool for animals and plants. Bioinformatics (Oxford, England) 36, 2628-2629.
    Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., et al., 2011. Full-length transcriptome assembly from rna-seq data without a reference genome. Nat. Biotechnol. 29, 644-652.
    Greener, J.G., Kandathil, S.M., Moffat, L.,Jones, D.T., 2022. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23, 40-55.
    Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith, R.K., Jr., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., Town, C.D., et al., 2003. Improving the arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654-5666.
    Hinnebusch, A.G., Ivanov, I.P.,Sonenberg, N., 2016. Translational control by 5'-untranslated regions of eukaryotic mrnas. Science 352, 1413-1416.
    Hon, C.C., Ramilowski, J.A., Harshbarger, J., Bertin, N., Rackham, O.J., Gough, J., Denisenko, E., Schmeier, S., Poulsen, T.M., Severin, J., et al., 2017. An atlas of human long non-coding rnas with accurate 5' ends. Nature 543, 199-204.
    Hsu, P.Y., Calviello, L., Wu, H.L., Li, F.W., Rothfels, C.J., Ohler, U.,Benfey, P.N., 2016. Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 113, E7126-e7135.
    Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.,Weissman, J.S., 2009. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218-223.
    Johnstone, T.G., Bazzini, A.A.,Giraldez, A.J., 2016. Upstream orfs are prevalent translational repressors in vertebrates. EMBO J. 35, 706-723.
    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Zidek, A., Potapenko, A., et al., 2021. Highly accurate protein structure prediction with alphafold. Nature 596, 583-589.
    Juntawong, P., Girke, T., Bazin, J.,Bailey-Serres, J., 2014. Translational dynamics revealed by genome-wide profiling of ribosome footprints in arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 111, E203-E212.
    Kanamori-Katayama, M., Itoh, M., Kawaji, H., Lassmann, T., Katayama, S., Kojima, M., Bertin, N., Kaiho, A., Ninomiya, N., Daub, C.O., et al., 2011. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 21, 1150-1159.
    Kang, Y.J., Yang, D.C., Kong, L., Hou, M., Meng, Y.Q., Wei, L.,Gao, G., 2017. Cpc2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12-w16.
    Kelley, D.R., Snoek, J.,Rinn, J.L., 2016. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990-999.
    Kim, D., Langmead, B.,Salzberg, S.L., 2015. Hisat: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357-360.
    Kindgren, P., Ard, R., Ivanov, M.,Marquardt, S., 2018. Transcriptional read-through of the long non-coding rna svalka governs plant cold acclimation. Nat. Commun. 9, 4561.
    Kindgren, P., Ivanov, M.,Marquardt, S., 2020. Native elongation transcript sequencing reveals temperature dependent dynamics of nascent rnapii transcription in arabidopsis. Nucleic Acids Res. 48, 2332-2347.
    Kopylova, E., Noe, L.,Touzet, H., 2012. Sortmerna: fast and accurate filtering of ribosomal rnas in metatranscriptomic data. Bioinformatics (Oxford, England) 28, 3211-3217.
    Kurihara, Y., Makita, Y., Kawashima, M., Fujita, T., Iwasaki, S.,Matsui, M., 2018. Transcripts from downstream alternative transcription start sites evade uorf-mediated inhibition of gene expression in arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 115, 7831-7836.
    Laloum, T., Martin, G.,Duque, P., 2018. Alternative splicing control of abiotic stress responses. Trends Plant Sci. 23, 140-150.
    Langmead, B.,Salzberg, S.L., 2012. Fast gapped-read alignment with bowtie 2. Nat. Methods 9, 357-359.
    LeCun, Y.,Bengio, Y., 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory Neural Networks 3361, 1995.
    LeCun, Y., Bengio, Y.,Hinton, G., 2015. Deep learning. Nature 521, 436-444.
    Lei, L., Shi, J., Chen, J., Zhang, M., Sun, S., Xie, S., Li, X., Zeng, B., Peng, L., Hauck, A., et al., 2015. Ribosome profiling reveals dynamic translational landscape in maize seedlings under drought stress. Plant J. 84, 1206-1218.
    Li, B.,Dewey, C.N., 2011. Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinform. 12, 323.
    Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.,Durbin, R., 2009. The sequence alignment/map format and samtools. Bioinformatics (Oxford, England) 25, 2078-2079.
    Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., Stoica, I., 2018. Tune: a research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
    Liu, M.J., Wu, S.H., Wu, J.F., Lin, W.D., Wu, Y.C., Tsai, T.Y., Tsai, H.L.,Wu, S.H., 2013. Translational landscape of photomorphogenic arabidopsis. Plant Cell 25, 3699-3710.
    Liu, X.Y., Wu, J.,Zhou, Z.H., 2009. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539-550.
    Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10-12.
    Mazzoni-Putman, S.M.,Stepanova, A.N., 2018. A plant biologist's toolbox to study translation. Front. Plant Sci. 9, 873.
    Mejia-Guerra, M.K., Li, W., Galeano, N.F., Vidal, M., Gray, J., Doseff, A.I.,Grotewold, E., 2015. Core promoter plasticity between maize tissues and genotypes contrasts with predominance of sharp transcription initiation sites. Plant Cell 27, 3309-3320.
    Mendell, J.T., Sharifi, N.A., Meyers, J.L., Martinez-Murillo, F.,Dietz, H.C., 2004. Nonsense surveillance regulates expression of diverse classes of mammalian transcripts and mutes genomic noise. Nat. Genet. 36, 1073-1078.
    Merchante, C., Brumos, J., Yun, J., Hu, Q., Spencer, K.R., Enriquez, P., Binder, B.M., Heber, S., Stepanova, A.N.,Alonso, J.M., 2015. Gene-specific translation regulation mediated by the hormone-signaling molecule ein2. Cell 163, 684-697.
    Merchante, C., Stepanova, A.N.,Alonso, J.M., 2017. Translation regulation in plants: an interesting past, an exciting present and a promising future. Plant J. 90, 628-653.
    Meteignier, L.V., El Oirdi, M., Cohen, M., Barff, T., Matteau, D., Lucier, J.F., Rodrigue, S., Jacques, P.E., Yoshioka, K.,Moffett, P., 2017. Translatome analysis of an nb-lrr immune response identifies important contributors to plant immunity in arabidopsis. J. Exp. Bot. 68, 2333-2344.
    Morton, T., Petricka, J., Corcoran, D.L., Li, S., Winter, C.M., Carda, A., Benfey, P.N., Ohler, U.,Megraw, M., 2014. Paired-end analysis of transcription start sites in arabidopsis reveals plant-specific promoter signatures. Plant Cell 26, 2746-2760.
    Mudge, J.M., Ruiz-Orera, J., Prensner, J.R., Brunet, M.A., Calvet, F., Jungreis, I., Gonzalez, J.M., Magrane, M., Martinez, T.F., Schulz, J.F., et al., 2022. Standardized annotation of translated open reading frames. Nat. Biotechnol. 40, 994-999.
    Nielsen, M., Ard, R., Leng, X., Ivanov, M., Kindgren, P., Pelechano, V.,Marquardt, S., 2019. Transcription-driven chromatin repression of intragenic transcription start sites. PLoS Genet. 15, e1007969.
    Niu, R., Zhou, Y., Zhang, Y., Mou, R., Tang, Z., Wang, Z., Zhou, G., Guo, S., Yuan, M.,Xu, G., 2020. Uorflight: a vehicle toward uorf-mediated translational regulation mechanisms in eukaryotes. Database (Oxford) 2020.
    Pachganov, S., Murtazalieva, K., Zarubin, A., Sokolov, D., Chartier, D.R.,Tatarinova, T.V., 2019. Transprise: a novel machine learning approach for eukaryotic promoter prediction. PeerJ 7, e7990.
    Pajerowska-Mukhtar, K.M., Wang, W., Tada, Y., Oka, N., Tucker, C.L., Fonseca, J.P.,Dong, X., 2012. The hsf-like transcription factor tbf1 is a major molecular switch for plant growth-to-defense transition. Curr. Biol. 22, 103-112.
    Pertea, G.,Pertea, M., 2020. Gff utilities: Gffread and gffcompare. F1000Res. 9.
    Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T.,Salzberg, S.L., 2015. Stringtie enables improved reconstruction of a transcriptome from rna-seq reads. Nat. Biotechnol. 33, 290-295.
    Quang, D.,Xie, X., 2016. Danq: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107.
    Quinlan, A.R.,Hall, I.M., 2010. Bedtools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 26, 841-842.
    Raghavan, V., Kraft, L., Mesny, F.,Rigerte, L., 2022. A simple guide to de novo transcriptome assembly and annotation. Brief. Bioinform. 23.
    Roberts, A.,Pachter, L., 2013. Streaming fragment assignment for real-time analysis of sequencing experiments. Nat. Methods 10, 71-73.
    Roy, B.,von Arnim, A.G., 2013. Translational regulation of cytoplasmic mrnas. Arabidopsis Book 11, e0165.
    Sample, P.J., Wang, B., Reid, D.W., Presnyak, V., McFadyen, I.J., Morris, D.R.,Seelig, G., 2019. Human 5' utr design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol. 37, 803-809.
    Shahmuradov, I.A., Umarov, R.K.,Solovyev, V.V., 2017. Tssplant: a new tool for prediction of plant pol ii promoters. Nucleic Acids Res. 45, e65.
    Son, S.,Park, S.R., 2023. Plant translational reprogramming for stress resilience. Front. Plant Sci. 14, 1151587.
    Sonnenburg, S., Zien, A.,Ratsch, G., 2006. Arts: accurate recognition of transcription starts in human. Bioinformatics (Oxford, England) 22, e472-e480.
    Tang, A.D., Soulette, C.M., van Baren, M.J., Hart, K., Hrabeta-Robinson, E., Wu, C.J.,Brooks, A.N., 2020. Full-length transcript characterization of sf3b1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438.
    Thieffry, A., Lopez-Marquez, D., Bornholdt, J., Malekroudi, M.G., Bressendorff, S., Barghetti, A., Sandelin, A.,Brodersen, P., 2022. Pamp-triggered genetic reprogramming involves widespread alternative transcription initiation and an immediate transcription factor wave. Plant Cell 34, 2615-2637.
    Thieffry, A., Vigh, M.L., Bornholdt, J., Ivanov, M., Brodersen, P.,Sandelin, A., 2020. Characterization of arabidopsis thaliana promoter bidirectionality and antisense rnas by inactivation of nuclear rna decay pathways. Plant Cell 32, 1845-1867.
    Thomas, Q.A., Ard, R., Liu, J., Li, B., Wang, J., Pelechano, V.,Marquardt, S., 2020. Transcript isoform sequencing reveals widespread promoter-proximal transcriptional termination in Arabidopsis. Nat. Commun. 11, 2589.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.,Polosukhin, I., 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30.
    Wang, J., Zhang, X., Greene, G.H., Xu, G.,Dong, X., 2022. Pabp/purine-rich motif as an initiation module for cap-independent translation in pattern-triggered immunity. Cell 185, 3186-3200.e3117.
    Wellensiek, B.P., Larsen, A.C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B.L., Kumar, S., et al., 2013. Genome-wide profiling of human cap-independent translation-enhancing elements. Nat. Methods 10, 747-750.
    Wu, H.L., Jen, J.,Hsu, P.Y., 2023. What, where, and how: regulation of translation and the translational landscape in plants. Plant Cell.
    Wyman. D., Balderrama-Gutierrez, G., Reese, F., Jiang, S., Rahmanian, S., Forner, S., Matheos, D., Zeng, W., Williams. B., Trout. D., et a., 2019. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. bioRxiv 672931.
    Xu, G., Greene, G.H., Yoo, H., Liu, L., Marques, J., Motley, J.,Dong, X., 2017a. Global translational reprogramming is a fundamental layer of immune regulation in plants. Nature 545, 487-490.
    Xu, G., Yuan, M., Ai, C., Liu, L., Zhuang, E., Karapetyan, S., Wang, S.,Dong, X., 2017b. Uorf-mediated translation allows engineered plant disease resistance without fitness costs. Nature 545, 491-494.
    Yang, Y., Wen, X., Wu, Z., Wang, K.,Zhu, Y., 2023. Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton. Sci. China Life Sci. 66, 1711-1724.
    Yoo, H., Greene, G.H., Yuan, M., Xu, G., Burton, D., Liu, L., Marques, J.,Dong, X., 2020. Translational regulation of metabolic dynamics during effector-triggered immunity. Mol. Plant 13, 88-98.
    Yuan, S., Zhou, G.,Xu, G., 2023. Translation machinery: the basis of translational control. J. Genet. Genomics.
    Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A.,Telenti, A., 2019. A primer on deep learning in genomics. Nat. Genet. 51, 12-18.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (0) PDF downloads (0) Cited by ()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return