TAXONOMY PROFILING
TAXONOMY PROFILING

TAXONOMY PROFILING

Taxonomy profiling of microbiome samples. A shotgun emphasis

 

Here, I will show the importance of the taxonomy profiling. Besides I provide a scope of the options to make taxonomic classifications and their pros/con. I start with the basic question: shotgun or amplicon? Then I explain the taxonomy classifiers used for shotgun sequencing: homology-based programs (divided in DNA, protein, and marker-genes algorithms) and similarity-based programs.

Summary
Details
Links to taxonomy profilers
Some references

Summary…

Microbiome research main questions:

  • Who?
    • Corelations with environmental variables and time.
    • Experimental models can be used to confirm corelations.
  • What is it doing?
    • Explore the functional potential can propose environment/host-microbe interactions and improve the correlational studies

Taxonomy profiling

  • Comparison of the microbiome sequences properties against microbial reference genomes.
  • Amplicon sequencing
    • Uses oligonucleotide probes to amplify specific genome regions. The conserved and hypervariable regions in the ribosome are used as primers.
      • 16S – prokaryotes
      • 18S – eukaryotic microbes
      • ITS – fungi
    • Advantages
      • Cheaper
      • Less susceptibility to contamination
      • Precise
      • Functional predictions based in reference genomes annotations
      • Better databases
    • Disadvantages
      • It reports a small set of organisms.
      • Functional potential of your samples will be ignored. Gene counts inaccessible.
      • Low confidence for taxonomic abundance estimation
      • Resolution: Genus
    • Shotgun sequencing
      • All the metagenomic DNA is sequenced.
      • Advantages
        • It detects bacteria, archaea, viruses, etc.
        • Accesses to the functional POTENTIAL in the microbiome
        • Resolution: Strains
        • Recovers single nucleotide variants
        • Better predictions of the taxonomic abundance
      • Disadvantages
        • Expensive
        • More susceptible to contamination
        • Less extensive databases: It is more likely to deplete.
      • Types of taxonomy profilers
        • Reads profiling: More confident representations of the DNA content. Better for fragments shorter than 1 kb
          • Homology-based
            • DNA: More restrictive than the protein programs.
              • Kraken: k-mer based classifier; reads input.
              • CLARK: k-mer based; DNA/RNA reads/scaffolds/contigs/etc input; *A single index per taxonomic rank. Lower memory/time consumption.
            • Protein: Mutational rate, and taxonomy bias in the incomplete databases makes tricky to ask very strict hits as the DNA programs. Useful in cross reference the reads with a predicted protein with the taxonomy classification of this read.
              • MEGAN: reads input. More precision values at higher time/classification rates. Requires previous alignment of your reads. Easy visualization.
              • Kaiju: reads input.
              • CARMA: reads input. Pfam based.
            • Marker-genes: Genes shared across domains, specific clade genes. Reduces classified seqs/false positives; faster.
              • MetaPhlAn
              • MetaPhyler
            • Composition-based: Wider perspective. Smaller db. Useful to classify complex environments. Lower memory requirements and time consumed.
              • PhyloPythia (SVM): Assembly input.
              • Phymm (IMMs): Reads input.
              • PhymmBL: accuracy improvement.
            • Assembled microbiome profiling
            • Heuristic solutions: use of k-mers. Used in different types of taxonomy profilers.
              • Short k-mer length: the precision will be reduced. It is more suitable if your microbiota is composed of distant taxa
              • Longer k-mer sizes: more restrictive
  • What should I consider?
    • Resources
    • Expertise
    • How many samples do I have?
    • Do I have a complex environment?
    • Priority: Taxonomy or functional potential?
    • Expected resolution
    • Do I suspect taxonomically distant taxa in my samples?

Some details…

Links: Taxonomy profiling programs…

Some references…