MicrobiomeAnalyst

Home
Normalization

Formats
Forum
Resources
Updates
Data Policy

Data Normalization

Normalization aims to address the variability in sampling depth and the sparsity of the data to enable more biologically meaningful comparisons. All of these methods require raw count data as input. You can rarefy your data followed by either data scaling or data transformation. However, you cannot apply both data scaling and data transformation, because scaled or transformed data is no longer valid count data.

When the library sizes are very different (i.e. > 10 times), rarefying is recommended (see Weiss, S et al.). Rarefying is mainly used for 16S marker gene data and is disabled for shotgun metagenomics data.
The normalized data are mainly used for data visualization (boxplot) as well as general statistical methods such as t-tests, ANOVA, etc; For statistical comparisons come with their own normalization methods such as DESeq2, edgeR, limma, or metagenomeSeq, MicrobiomeAnalyst will apply their own normalization methods (as recommended in their user manuals) directly from filtered count data.

Data rarefying

Do not rarefy my data

Rarefy to a library size of

Data scaling

Do not scale my data

Total sum scaling (TSS)

Cumulative sum scaling (CSS)

Upper-quartile normalization (UQ)

Data transformation

Do not transform my data

Relative log expression (RLE)

Trimmed mean of M-values (TMM)

Centered log ratio (CLR)

The method aims to bring all samples to the same scale by dividing the samples by a scaling factor. Some common choices include total sum scaling (TSS), cumulative sum scaling (CSS), and upper-quantile scaling (UQ).

Variance stabilization transformation such as log-ratio transformation and its variations. Some common choices include centered log-ratio (CLR) transformation, relative log expression (RLE) normalization, or weighted trimmed mean of M-values (TMM).

All samples will be rarefied to even sequencing depth based on the sample having the lowest sequencing depth. If this sample contains extremely low reads, you may need to manually exclude this sample (using the Sample Editor) to avoid significant data loss. You can find out if this is the case from View Sample Size from the Data Summary page.

You can rarefy the samples to a specific depth. The default value is the minimum library size after filtration. The maximum allowable size is capped at the 3rd quantile of the sample sizes after filtration.

Downloads of the page

The normalized data will appear in the "Downloads" page named "data_normalized.csv"

R Command History

Processing ....

This may take a while to complete, please be patient....

Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?

We use cookies. Essential session cookies are required for the site to function. Google Analytics is used to understand website traffic