Application Note: MetAmp_AppNote.pdf

Documentation in one file: UserGuide.pdf

Program description

MetAmp tool is developed for analysis of amplicon data by combining several marker regions from 16S rRNA genes. Such marker regions serve as unique identificators for species. There are nine marker regions in bacterial 16S rRNA gene.

Installation

Make sure that you have a stable Internet connection and R, Python 2.7.6 (at least, since older versions may not support some features) and GCC installed. Download or clone from the GitHub, save it in some folder you wish. Then simply run make. This will check for required packages and install in case if some packages were not found.

$cd MetAmp_home_directory
$make

In some cases you have to be a system administrator, so you need to compile MetAmp with the following command:

$sudo make

Quick start

Before analysis, you may need to perform re-labeling of your read headers: a read header should contain barcodelabel (see more at www.drive5.com), for example:

@read1;barcodelabel=TCAG;
GATGAACGCTGGCGGCGTGCCTAATACATGCAAGT...AT
+
IIIAAAIIIIIIIIIIIIIIIHHHIAAAAAAAAAA...II
If you use Illumina sequence data, you also have to merge overlapping paired-end reads. Later I will provide the scripts that can do these things above, but for now you can simply use scripts provided at www.drive5.com.

To run the program, you have to provide reference libraries, your amplicon data and output directory where all analysis results will be strored.

One marker region:

$python metamp.py -r data/gold21/gold21.fasta -r1 data/gold21/gold21_V13V31.fasta -l1 data/even/SRR072220_V13V31_relabeled.fastq -o test
Here:

    metamp.py - a program name.

    -r data/gold21/gold21.fasta - passing the reference file, that contains whole 16S sequences, in forward and reverse complement.

    -r1 data/gold21/gold21_V13V31.fasta - passing the reference file that contains marker (region V1-3) sequences extracted from gold21.fasta (whole 16S), in forward and reverse complement.

    -l1 data/even/SRR072220_V13V31_relabeled.fastq - amplicon emprirical reads. ”relabeled” means that a barcode sequences were attached to each read label.

    -o test - output directory with all analysis results.

Three markers:

$python metamp.py -r data/gold21/gold21.fasta -r1 data/gold21/gold21_V13V31.fasta -l1 data/even/SRR072220_V13V31_relabeled.fastq -r2 data/gold21/gold21_V35V53.fasta -l2 data/even/SRR072220_V35V53_relabeled.fastq -r3 data/gold21/gold21_V69V96.fasta -l3 data/even/SRR072239_V69V96_relabeled.fastq -o test

Here:

    -r2 data/gold21/gold21_V35V53.fasta - a reference file that contains marker (re- gion V3-5) sequences extracted from gold21.fasta (whole 16S), in forward and reverse com- plement.

    -l2 data/even/SRR072220_V35V53_relabeled.fastq - amplicon empirical reads.

    -r3 data/gold21/gold21_V69V96.fasta- a reference file that contains marker (re- gion V3-5) sequences extracted from gold21.fasta (whole 16S), in forward and reverse com- plement.

    -l3 data/even/SRR072239_V69V96_relabeled.fastq - amplicon empirical reads.

To get help from the program, you can run python metamp.py -h

Output files will be in analysis directory that you provided with the -o command:

    clusters.clstr - contains large table in uc format (see www.drive5.com).

    otu_table.txt - OTU table, where rows are OTUs and colums are barcodes.

    coordinates.crd - file that contains NMDS coordinates of each point (reference and empirical, see provided explanations in UserGuide.pdf).

    log.txt - a simple log file that record every stage of analysis.