BCM-HGSC Software

Atlas 2 Atlas2 is a next-generation sequencing suite of variant analysis tools specializing in the separation of true SNPs and insertions and deletions (indels) from sequencing and mapping errors in Whole Exome Capture Sequencing (WECS) data.
ATLAS GapFill ATLAS GapFill (1) Identifies gap-associated reads from BWA mapping results for each gap, and assembles each gap locally using different available assemblers (Phrap, Newbler, and Velvet), (2) Compares the locally assembled contigs to the corresponding reference scaffold using Crossmatch, and (3) Fills the gaps of the scaffold with the locally assembled contigs that bridge the gaps.
Atlas Whole Genome Assembly Suite Atlas is a collection of software tools to facilitate the assembly of large genomes from whole genome shotgun reads, or a combination of whole genome shotgun reads and BAC or other localized reads.
Atlas-Link Atlas-Link links and orients genome sequence contigs quickly and accurately using mate-pair information. Atlas-Link can take advantage of unused mate-pair data from a WGS genome assembly or additional sequence data from different sequencing technologies or more recent sequence production. Atlas-Link currently supports Illumina, SOLiD, 454, and Sanger sequencing technologies.
atlas-readpainter atlas-readpainter is a multiple alignment to reference sequence tool.
Bang Bang is a fast repeat-supressing search tool, written primarily for anchoring reads to genomes but also adaptable to other genome scale comparison problems.
BCM Java Alignment Viewer The Java Alignment Viewer is a Java application that provides improved sequence alignment visualization for amino acid and nucleotide alignments. It is especially effective for large sequences.
BCM Trace Viewer The BCM Trace Viewer is a Java application/applet to display .scf traces and phred quality values.
bcm-ace-plots bcm-ace-plots reads in an Ace format assembly file produced by the Phrap (Green et al.) assembly software. Using special tags in the read names, bcm-ace-plots will plot the template coverage, the template span, the coverage, the BAC coverage, the WGS coverage, quality, and high quality discrepancies.
Cassandra Cassandra v15.4.10 combines annovar output with other public datasources to output annotated .vcf files.
CBT++ CBT++ (Computational Biology Tools) is a collection of C++ classes intended to simplify the development of complex computational biology applications. It includes basic functionality to manipulate sequences, classes to parse common kinds of files, and implementations of various sequence analysis algorithms.
CoCa Comprehensive covid analysis pipeline for short-reads
DRAGEN DRAGEN utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection).
ERIS ERIS is quality control software that assesses possible contamination of Illumina whole-genome and whole-exome sequence data by comparing sequenced reads to SNP array data. Implemented as an automated pipeline at the BCM-HGSC, ERIS also validates the identities of all samples, detecting potential swaps that can occur throughout the pipeline.
ExCID Report The Exome Coverage and Identification (ExCID) Report is a software tool developed at BCM-HGSC to assess sequence depth in user-defined targeted regions. The tool was initially developed for use in targeted capture applications, but its functionality has evolved to encompass any sequencing application from amplicon and targeted capture sequencing to WGS. ExCID analyzes sequence depth of any sequencing event, reports the average coverage across each target, and identifies bases below a user-defined threshold (20X coverage by default). Furthermore, the tool annotates the target with the latest gene, transcript, and exon information from RefSeq and the Human Gene Mutation Database (HGMD). The report has the option to output data tracks of sample targets and coverage that can be visualized in UCSC and IGV genome browsers.
Honey PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.
Jelly PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.
MegaDot MegaDot is a large scale dot plotter which is capable of producing dot density plots of tens of megabases of DNA sequence on a modest sized desktop machine, and of entire mammalian chromosomes on server scale computers.
Mercury Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. The local deployment at HGSC handles analysis at terabyte scales, working reliably 24x7 all year round. Instructions to install Mercury on your local infrastructure is described below. Also, we have ported Mercury onto the cloud, to allow researchers to test drive our pipeline without the need to install or configure anything. This is enabled using the DNAnexus platform.
Parliament2 Parliament2 identifies structural variants in a given sample relative to a reference genome. These structural variants cover large deletion events that are called as Deletions of a region, Insertions of a sequence into a region, Duplications of a region, Inversions of a region, or Translocations between two regions in the genome.
Princess Princess is a fast and scalable framework to detect and report haplotype resolved Single Nucleotide Variants (SNV) and Structural Variations (SVs) at scale. It can leverage your cluster environment to speed up the detection which starts with one or many fasta or fastq files.
SimPed SimPed quickly generates haplotype and/or genotype data for large number of marker loci regardless of pedigree structure. The program is written in C and can be run under Windows, Linux, Unix, or Mac OS X. This program has been developed in the laboratory of Suzanne M. Leal, Ph.D.
Sniffles2 A fast structural variant caller for long-read sequencing, Sniffles2 accurately detect SVs on germline, somatic and population-level for PacBio and Oxford Nanopore read data.
SNPTools SNPTools is a suite of tools that enables integrative SNP analysis in next generation sequencing data with large cohorts. It not only calls SNP in a population with high sensitivity and accuracy, but also employs a novel imputation engine to achieve highly accurate genotype calls in an efficient way.
SVCollector SVCollector is an open-source method that optimally selects samples to maximize variant discovery and validation using long read resequencing or PCR-based validation. SVCollector has two modes: selecting those samples that are individually the most diverse or those that collectively capture the largest number of variations.
SVhound SVhound is a framework to predict regions that harbour so far unidentified genotypes of Structural Variations. It uses a population size VCF file as input and reports the probabilities and regions across the population. SVhound was tested and applied to the 1000genomes VCF file and also other data sets.
xAtlas xAtlas is a fast and scalable small variant caller that has been developed at the Baylor College of Medicine Human Genome Sequencing Center.