5.9
CiteScore
5.9
Impact Factor
Volume 48 Issue 9
Sep.  2021
Turn off MathJax
Article Contents

TaxonKit: A practical and efficient NCBI taxonomy toolkit

doi: 10.1016/j.jgg.2021.03.006
Funds:

We thank Yong-Xin Liu (State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China), Qi Zhao (Sun Yat-sen University Cancer Center, Guangzhou, China), Zhi-Luo Deng (Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany), and Cai-Yun Zhu (Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China) for giving advice and comments on the manuscript. We are also grateful to TaxonKit users who have greatly helped to report bugs and suggest new features. We thank Daniel S. Standage (National Biodefense Analysis and Countermeasures Center, Fort Detrick, USA) for writing the Python bindings for TaxonKit. This work was supported by grants from the National Natural Science Foundation of China (32000474) to W.S. and the National Science and Technology Major Project of China (2017ZX10202203-007-001) to H.R.

  • Received Date: 2021-02-09
  • Accepted Date: 2021-03-27
  • Rev Recd Date: 2021-03-15
  • Publish Date: 2021-04-15
  • The National Center for Biotechnology Information (NCBI) Taxonomy is widely applied in biomedical and ecological studies. Typical demands include querying taxonomy identifier (TaxIds) by taxonomy names, querying complete taxonomic lineages by TaxIds, listing descendants of given TaxIds, and others. However, existed tools are either limited in functionalities or inefficient in terms of runtime. In this work, we present TaxonKit, a command-line toolkit for comprehensive and efficient manipulation of NCBI Taxonomy data. TaxonKit comprises seven core subcommands providing functions, including TaxIds querying, listing, filtering, lineage retrieving and reformatting, lowest common ancestor computation, and TaxIds change tracking. The practical functions, competitive processing performance, scalability with different scales of datasets and good accessibility can facilitate taxonomy data manipulations. TaxonKit provides free access under the permissive MIT license on GitHub, Brewsci, and Bioconda. The documents are also available at https://bioinf.shenwei.me/taxonkit/.
  • loading
  • Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., 2009. Blast+: architecture and applications. BMC Bioinf. 10, 421.
    Camargo, A.P., 2020. Taxopy: A Python package for obtaining complete lineages and the lowest common ancestor (LCA) from a set of taxonomic identifiers. https://github.com/apcamargo/taxopy.
    Chamberlain, S.A., Szocs, E., 2013. Taxize: taxonomic search and retrieval in R. F1000Res 2, 191.
    Dirksen, P., Assie, A., Zimmermann, J., Zhang, F., Tietje, A.M., Marsh, S.A., Felix, M.A., Shapira, M., Kaleta, C., Schulenburg, H., 2020. Cembio — the Caenorhabditis Elegans microbiome resource. G3 (Bethesda) 10, 3025-3039.
    Gruning, B., Dale, R., Sjodin, A., Chapman, B.A., Rowe, J., Tomkins-Tinch, C.H., Valieris, R., Koster, J., Bioconda, T., 2018. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475-476.
    Gubina, N., Leboeuf, D., Piatkov, K., Pyatkov, M., 2020. Novel apoptotic mediators identified by conservation of vertebrate caspase targets. Biomolecules 10, 612.
    Huerta-Cepas, J., Serra, F., Bork, P., 2016. Ete 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635-1638.
    Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C., 2007. Megan analysis of metagenomic data. Genome Res. 17, 377-386.
    Kuczynski, J., Stombaugh, J., Walters, W.A., Gonzalez, A., Caporaso, J.G., Knight, R., Using QIIME to analyze 16s rRNA gene sequences from microbial communities. Curr. Protoc. Microbiol. Chapter 1: Unit 1E.5.
    Lee, M.D., 2019. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics 35, 4162-4164.
    Leray, M., Knowlton, N., Ho, S.L., Nguyen, B.N., Machida, R.J., 2019. Genbank is a reliable resource for 21st century biodiversity research. Proc. Natl. Acad. Sci. U. S. A. 116, 22651-22656.
    Liu, Y.X., Qin, Y., Chen, T., Lu, M., Qian, X., Guo, X., Bai, Y., 2020. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 12, 315-330.
    Lu, J., Breitwieser, F.P., Thielen, P., Salzberg, S.L., 2017. Bracken: estimating species abundance in metagenomics data. PeerJ. Comp. Sci. 3, e104.
    Martins, F.M.S., Porto, M., Feio, M.J., Egeter, B., Bonin, A., Serra, S.R.Q., Taberlet, P., Beja, P., 2020. Modelling technical and biological biases in macroinvertebrate community assessment from bulk preservative using multiple metabarcoding markers. Mol. Ecol. 30, 3221-3238.
    Milanese, A., Mende, D.R., Paoli, L., Salazar, G., Ruscheweyh, H.J., Cuenca, M., Hingamp, P., Alves, R., Costea, P.I., Coelho, L.P., 2019. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014.
    Parks, D.H., Chuvochina, M., Chaumeil, P.A., Rinke, C., Mussig, A.J., Hugenholtz, P., 2020. A complete domain-to-species taxonomy for bacteria and archaea. Nat. Biotechnol. 38, 1079-1086.
    Ramsey, J., Rasche, H., Maughmer, C., Criscione, A., Mijalis, E., Liu, M., Hu, J.C., Young, R., Gill, J.J., 2020. Galaxy and Apollo as a biologist-friendly interface for high-quality cooperative phage genome annotation. PLoS Comput. Biol. 16, e1008214.
    Sayers, E., 2010. A general introduction to the E-Utilities. National Center for Biotechnology Information.
    Schoch, C., 2020. NCBI Help Manual: Taxonomy help. National Center for Biotechnology Information.
    Schoch, C.L., Ciufo, S., Domrachev, M., Hotton, C.L., Kannan, S., Khovanskaya, R., Leipe, D., McVeigh, R., O’Neill, K., Robbertse, B., et al., 2020. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford), 2020, baaa062.
    Segata, N., Izard, J., Waldron, L., Gevers, D., Miropolsky, L., Garrett, W.S., Huttenhower, C., 2011. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60.
    Swanson, G.M., Moskovtsev, S., Librach, C., Pilsner, J.R., Goodrich, R., Krawetz, S.A., 2020. What human sperm RNA-seq tells us about the microbiome. J. Assist. Reprod. Genet. 37, 359-368.
    Truong, D.T., Franzosa, E.A., Tickle, T.L., Scholz, M., Weingart, G., Pasolli, E., Tett, A., Huttenhower, C., Segata, N., 2015. Metaphlan2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902-903.
    Tso, C.H., Wu, J.L., Lu, M.W., 2020. Blast2Fish: a reference-based annotation web tool for transcriptome analysis of non-model teleost fish. BMC Bioinf. 21, 174.
    Valadares, R.B.S., Marroni, F., Sillo, F., Oliveira, R.R.M., Balestrini, R., Perotto, S., 2021. A transcriptomic approach provides insights on the mycorrhizal symbiosis of the mediterranean orchid Limodorum Abortivum in nature. Plants 10, 251.
    Wood, D.E., Lu, J., Langmead, B., 2019. Improved metagenomic analysis with kraken 2. Genome Biol. 20, 257.
    Wood, D.E., Salzberg, S.L., 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (564) PDF downloads (14) Cited by ()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return