Genomic Assembly and Analysis Workflows

The Pathogen Genomics Centers of Excellence (PGCoE) Network is currently conducting a viral respiratory benchmarking study, which aims to compare the results of pathogen genome assembly workflows used across the United States. When complete, the bioinformatics workflows listed below will be annotated according to their applicability for different sequencing platforms and effectiveness for specific pathogens.

Further information on this benchmarking study, including details about the tool we are using to evaluate sequencing results, are available in our benchmarking repository.

PGCoE logo

Pathogen Genome Assembly Workflows

TheiaCoV Workflow Series

A workflow series for genomic characterization of SARS-CoV-2. There are currently five TheiaCoV workflows designed to accommodate different kinds of input data, including sequencing data generated on the Illumina, ONT, and ClearLabs platforms.

TheiaMeta

A workflow series for genomic analysis from metagenomic sequencing, using both reference-based and reference-free approaches.

TheiaProk Workflow Series

A workflow series for genomic characterization of bacterial genomes. There are currently four TheiaProk workflows designed to accommodate different kinds of input data, including sequencing data generated on the Illumina and ONT platforms.

VAPER

Pipeline for assembling viral whole genome sequences from probe enrichment (a.k.a hybrid capture/enrichment), shotgun metagenomic, and tiled-amplicon sequence data.

viral-pipelines

A set of scripts and tools for the assembly and analysis of viral next generation sequencing data. All workflows are also available on the Terra platform.

virus-shotgun

A simple pipeline for metagenomic classification (using Kraken2), assembly, and variant calling from viral sequencing data.

Genomic Analysis Workflows

BigBacter

BigBacter is a bacterial genomic surveillance pipeline to build phylogenetic trees, SNP matrices, and summary statistics, with output generated as MicroReact files. This pipeline also enables users to keep a running archive of genomic clusters that new sequences are added to if they're genomically similar, enabling longitudinal cluster monitoring.

camlhmp

An infrastructure for defining typing schemas for organisms. Current tools developed using this infrastructure include pasty (Pseudomonas aeruginosa serotyper), pbptyper (Streptococcus pneumoniae penicillin binding protein typing), sccmec (Staphylcoccal chromosome cassette mec typing), and tulatyper (Francisella tularensis subtyping), all of which are integrated with Bactopia.

Delphy

A fast, scalable, accurate, and accessible tool for Bayesian phylogenetics with a drag-and-drop interface that requires no local installation or coding.

JUNIPER

A framework for predicting both phylogenies and transmission networks for viral outbreaks, accounting for both within-host variability and incomplete data.

reconstructR

An R package that allows for prediction of transmission events in infectious disease outbreaks and can account for within-host variability.

Tip Trait Association Testing (TTAT)

A statistical framework for testing if more closely related taxa on a phylogenetic tree are more likely to share the same trait values than we'd expect just by chance, allowing one to test for association between tree tips and chosen characteristics. There are also features to analyze subregion heterogeneity of viral transmission and track when, where, and how many case introductions occur.