These 10 samples were divided into two groups based on proportion of organisms to test the sensitivity of Parallel-META 3 in distinguishing samples with different community patterns (detailed design of simulation in Table S1 of Supplementary file S1). On the other hand, in order to test the performance on shotgun metagenomics data, Datasets 4 was included, which contained 10 artificial metagenomic shotgun samples that were simulated with nine organisms using Dwgsim 12 (version 0.1.8). Tests on 5,337 samples with 1,117,555,208 16S amplicon sequences showed that Parallel-META 3 is able to uncover dynamics of microbiomic features and patterns in both taxonomical and functional aspects with higher efficiency, optimized memory usage and uncompromised precision from large-scale metagenome datasets.
In addition, this software is encapsulated and integrated into a well-configured and full-automatic pipeline for easy installation and friendly user experience. For high-performance computing over massive datasets, all processing steps in Parallel-META 3 are implemented using C/C++ and/or R with parallel computing techniques and self-adapted load balancing strategy. Compared to previous versions 8, 9, its advanced features include 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, biomarkers selection, and data visualization based on high-quality vector graphs.
1) that provides rapid data mining on taxonomy and metabolic function across a large number of metagenome datasets. Here we propose Parallel-META 3, a comprehensive and automatic software package ( Fig. Mothur 6 and QIIME 7 are widely used toolkits for analysing 16S rRNA amplicon based metagenome datasets, however their computing throughput has become a bottleneck moreover they both require dozens of dependency packages, thus the installation, configuration and operation are tedious and complicated. MetaPhlAn 5 profiles microbial community composition using a universal biomarker gene, yet it was designed for only shotgun metagenome datasets, and lacks in-depth analysis proceedings such as quantitative similarity calculation and diversity evaluation among multiple samples. Such integrated and in-depth comparison of taxonomical structure or functional profile in large-scale metagenomics datasets has become important or even essential in the many microbiota-enabled applications 4.Ī number of methods have been developed for metagenomic analysis.
Data mining across hundreds or even thousands of metagenomes promises to uncover highly valuable biological information, such as a landscape view of microbiota structure and function 1, or the fine association between microbiota dynamics and human health status 2 or environmental factors 3. With the rapid development of Next Generation Sequencing technologies, metagenome datasets have been increasing explosively, both in sample number and in sequence volume.
Both binary and source code packages are available at.
Parallel-META 3 is implemented in C/C++ and R, and integrated into an executive package for rapid installation and easy access under Linux and Mac OS X. Application of Parallel-META 3 on 5,337 samples with 1,117,555,208 sequences from diverse studies and platforms showed it could produce similar results as QIIME and PICRUSt with much faster speed and lower memory usage, which demonstrates its ability to unravel the taxonomical and functional dynamics patterns across large datasets and elucidate ecological links between microbiome and the environment. In this work we introduce Parallel-META 3, a comprehensive and fully automatic computational toolkit for rapid data mining among metagenomic datasets, with advanced features including 16S rRNA extraction for shotgun sequences, 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, bio-marker selection, interaction network construction, vector-graph-based visualization and parallel computing. Moreover, the complexity of configuring and operating computational pipeline also hinders efficient data processing for the end users. However, current methods for metagenomic analysis are limited by their capability for in-depth data mining among a large number of microbiome each of which carries a complex community structure. The number of metagenomes is increasing rapidly.