Data Format Overview

Two main types of file can be uploaded: a taxonomic profile (includes abundance and taxonomy information) and a metadata file.

Taxonomic profile formats

Taxonomic profiles derived from both amplified 16S rRNA census data and whole-genome shotgun metagenomic data can be uploaded. The following formats are accepted (accompanying example metadata files are also given):

  • mothur - uses both a consensus taxonomy file (download) and a .shared file (download). Download metadata file (download).
  • BIOM format (from QIIME v1.5.0+: rich-format) (download). metadata file can be provided seperately, if not present within biom file.
  • Tab-separated (.txt) files: abundance file(download) without taxonomy mapping information (download) or with taxonomic information (download). metadata information is included in such files.

mothur

Two files are needed for a mothur taxonomic profile: a consensus taxonomy file (download) and a .shared file (download). The consensus taxonomy file can be created with mothur's classify.otu command. The .shared format can be created using mothur's make.shared command.

The accompanying example metadata file can be downloaded here.

BIOM format

QIIME v1.5.0, QIIME has used the BIOM format for its OTU table format. The BIOM file can be generated using QIIME's make_otu_table.py script.

Example biom file can be downloaded from here.

Tab-separated (.txt) files

The tab-separated (.txt) format is used for taxonomic profiles. Basically, it's a data table containing expression values ( raw counts from 16S data saved as a tab delimited text file (.txt) with rows for features (OTUs) and columns for samples. The tab delimited file can be generated from any spreadsheet program.Such file has to be in specific format which is discussed below.

  • It should contain sample names in the first line starting with "#NAME" . The class labels of experimental conditions should be in a new line beginning with "#CLASS".
  • First row should contain taxon. Taxon names can be any valid taxonomic identifiers which has been annotated via greengenes or SILVA database.Such labels contain information from domain down to species, separated by semicolons (;).
  • Non specific taxon names (Eg. Otu0001) can also be used as first row.In such case,a tab-delimited (.txt) taxonomy mapping file can also be uploaded which contains information from domain down to species,for each taxon names.

For taxonomic mapping file,first row should contain taxonomic levels beginning with "#TAXONOMY".All non-specific taxon names will be present in first column of file.Example have been provided below.

Unrecognizable terms (e.g. "uncultured" or strain identifiers) can be included without problem. There is no requirement to include information for multiple taxonomic rank levels, and there is no minimum or maximum taxonomic rank that must be included. Data cells can indicate the read count (preferable) or proportions or percentages of taxa in each sample.

Note:

  • Both sample and feature names must be unique and consist of a combination of common English letters, underscores and numbers for naming purpose.Latin/Greek letters are not supported.
  • Data values (read counts or proportions) should contain only numeric and positive values. (Use empty or "NA" (without quotes) for missing values).

Example

  • Taxonomic profiles with valid taxonomy identifier labelled names (download)
    #NAME          Sample1  Sample2 Sample3	Sample4	Sample5	Sample6	Sample7	Sample8
    #CLASS         Y        N	N	Y	N	Y	Y	N
    Archaea;           219	49	42	50	6	17	22	21
    Archaea;Crenarchaeota;Thermoprotei;           424	0	191	0	0	0	0	0
    Bacteria;Acidobacteria;           32	4	4	22	76	16	1	0
    Bacteria;Actinobacteria;           47	0	0	4	0	0	0	0
                        
  • Taxonomic profiles with non specific taxon names (download)
    #NAME          Sample1  Sample2 Sample3	Sample4	Sample5	Sample6	Sample7	Sample8
    #CLASS         Y        N	N	Y	N	Y	Y	N
    OTU1           219	49	42	50	6	17	22	21
    OTU2           424	0	191	0	0	0	0	0
    OTU3           32	4	4	22	76	16	1	0
    OTU5           47	0	0	4	0	0	0	0
                        
  • Taxonomic mapping file (download)
    #TAXONOMY	Kingdom	Phylum	Class	Order	Family	Genus	Species
    Otu00001	Bacteria	Bacteroidetes	Bacteroidia	Bacteroidales	Prevotellaceae	Prevotellaceae	
    Otu00002	Bacteria	Proteobacteria	Epsilonproteobacteria	Campylobacterales	Helicobacteraceae	Helicobacter	
    Otu00003	Bacteria	Bacteroidetes	Bacteroidia	Bacteroidales	Prevotellaceae	Alloprevotella	
    Otu00004	Bacteria	Bacteroidetes	Bacteroidia	Bacteroidales	Bacteroidaceae	Bacteroides
                        

Metadata file format (download)

Tab delimited (.txt) format is also used for metadata files. Sample names/IDs are in first column beginning with "#NAME" in first row.

For metadata, sample names are present in columns and metadata types (e.g. depth, temperature) in rows. Data values should be discrete, qualitative labels (e.g. HIGH, MED, LOW). Empty or NA should be used for missing values. Use the same sample names/IDs as in your input taxonomic profile file. Note that you should make sure that neither your metadata type names or metadata labels include tab, since these are used to delimit separate items.

Example

#NAME       SampleType
Sample1     skin        
Sample2     gut
Sample3     skin                                                                                                   
Sample4     gut                                               
Sample5     gut
Sample6     gut
Sample7     skin
Sample8     skin
Processing ....
Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?