Cassandra

Quick Start Guide

Installing Cassandra

Cassandra v15.4.10 combines annovar output with other public datasources to output annotated .vcf files.

Dependancies: Perl, Java, Annovar.
  1. Download the Cassandra jar file and the associated datasources
  2. Unpack the datasources directory (DataSources) tar -zxvf cassandraDataSources.tar.gz
  3. Install Annovar http://www.openbioinformatics.org/annovar/annovar_download.html
  4. Install the human hg19 Refseq, UCSC and Ensembl data files for annovar:
    • perl annotate_variation.pl --buildver hg19 -downdb refGene
    • perl annotate_variation.pl --buildver hg19 -downdb knownGene
    • perl annotate_variation.pl --buildver hg19 -downdb ensGene
  5. Also download and copy the following files into the same directory as the other data files:

Cassandra annotates both SNPs and Indels it can also accept a pileup file if wanted.

The syntax for samtools pileup is:

  • For SNPs
    • samtools mpileup -I -l -f >
  • For INDELs
    • samtools mpileup  -l -f > Running Cassandra

Running Cassandra 

Usage: java -jar Cassandra.jar

Required parameters:

  • (-t|--task)   Annotation Task To Run Available Tasks:
    • - SNP
    • - Indel
    • - Annotate (handles both SNPs and Indels - reccomended)
  • (-i|--input)   Input .vcf file
    • annovarPath Path to directory containing annotate_variation.pl
    • annovarDB   Path to directory holding the annovar datafiles
    • annotationSources Path to directory containing annotation sources.

It is recommended that you use the 'Annotate' task; the new datasources and the --customAnalyses option only work with this option. The 'SNP' and 'Indel' tasks are designed specifically for a workflow used internally at the HGSC and have not been updated.

Optional paramters:

  • (-o|--output)   Output .vcf file - if left blank adds '_Annotated' to the input
    • filename. (default: )
  • (-p|--pileup)   Pileup file to use when making PU info
  • (-n|--nproc)    Number of processors available for use (default: 1)
  • [--tempDir ]  Where to write temporary files (default: /tmp)
  • [--annovarOptions]  Any command line options to use when running
    • Annovar (default: --splicing_threshold 5)
  • [--maxRecords ]  Maximum number of records to hold in RAM. (default: 250000)
  • [--customAnalyses] Choose which analyses to run, comma separated list.  Available Analyses:
  • DbSnp,ESP6500,CgMaf,Mappability,1000Genomes,Uniprot,CosmicCoding,DbNSFP,Encode,CosmicNonCoding
  • --annovarPath
            Path to directory containing annotate_variation.pl
  • --annovarDB
            Path to annovar datafiles
  • --annotationSources
            Path to directory containing annotation sources

Download Cassandra 15.4.10

Cassandra version 15.4.10

New in version 15.4.10

The following updated data sources:

  • ARIC - Allele freq from Aric cohort
  • ESEIndel- Allele frequencies from TCGA and 1KG
  • CADD - Combined Annotation Dependent Depletion scores (CADD)
  • ERBCTA - Activity prediction from Ensembl Regulatory Build
  • DBSCSNV- damaging prediction for SNVs in splicing consensus region
  • ERBSUM - Ensembl Regulatory build summary.
  • FANTOM_ENHANCER - Site is in an predicted enhancer by FAMTOM5.
  • FANTOM_CAGE - Site is within a FAMTOM5 Cap Analysis of Gene  Expression.
  • ERBSEG - Genome segment prediction based on 17 cell types from ENCODE and Roadmap by Ensembl Regulatory Build.
  • ERBTFBS - Predicted TFBS from Ensembl Regulatory Build.
  • Encode Genome Segmentation - Genome segmentation by ENCODE.
  • ENCDNA -  Encode DNASE1 hypersentivity sites.
  • ENCTFBS - ENCODE transcription factor binding site score.
  • funseq2 - funseq2 noncoding score.
  • GRASP - Analysis of genotype-phenotype results from 1390 genome-wide association studies.
  • OREGANNO - Gene regulatory element and polymorphism annotation.
  • RegulomeDB - DNA features and regulatory elements in non-coding regions of the human genome.
  • Phastcons - Phastcons conservation scores.
  • phylop - Phylop conservation scores.
  • siphy - Detects bases under selection.
  • exac - Exome Aggregation Consortium v0.3.
  • funseq - Funseq noncoding score.
  • Omim - Human genes and genetic phenotypes.

Previous versions of Cassandra

Cassandra version 14.2.5

New in version 14.2.5

Three three additional optional parameters:

  • --annovarPath
            Path to directory containing annotate_variation.pl
  • --annovarDB
            Path to annovar datafiles
  • --annotationSources
            Path to directory containing annotation sources

The following updated data sources:

  • Cosmic Release v68
  • Encode taken from ensembl release 75
  • DbNSFP v2.4
  • DbSNP   v138

 

Cassandra version 13.10.01

New in version 13.10.01

  • now handles multiple alleles at the same position
  • handles multiple samples per vcf
  • SNPs and Indels processed together
  • added ensembl and mitochndial annoatation
  • added sqlite db to track datasources
  • updated all datasources and added some new ones: Encode,Cosmic,

License

Copyright © Baylor College of Medicine Human Genome Sequencing Center. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE