Title | A multi-task convolutional deep neural network for variant calling in single molecule sequencing. |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Luo, R, Sedlazeck, FJ, Lam, T-W, Schatz, MC |
Journal | Nat Commun |
Volume | 10 |
Issue | 1 |
Pagination | 998 |
Date Published | 2019 Mar 01 |
ISSN | 2041-1723 |
Keywords | Base Sequence, Computational Biology, DNA Mutational Analysis, Genome, Human, Genome-Wide Association Study, Genomics, Genotype, Genotyping Techniques, Humans, INDEL Mutation, Nanopores, Neural Networks, Computer, Polymorphism, Single Nucleotide, Sequence Analysis, DNA, Software |
Abstract | The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model. |
DOI | 10.1038/s41467-019-09025-z |
Alternate Journal | Nat Commun |
PubMed ID | 30824707 |
PubMed Central ID | PMC6397153 |
Grant List | R01 HG006677 / HG / NHGRI NIH HHS / United States UM1 HG008898 / HG / NHGRI NIH HHS / United States |
A multi-task convolutional deep neural network for variant calling in single molecule sequencing.
Similar Publications
Deep sequencing of candidate genes identified 14 variants associated with smoking abstinence in an ethnically diverse sample. Sci Rep. 2024;14(1):6385. | .
FAIR Header Reference genome: a TRUSTworthy standard. Brief Bioinform. 2024;25(3). | .
Gut Microbiota and Blood Metabolites Related to Fiber Intake and Type 2 Diabetes. Circ Res. 2024;134(7):842-854. | .