Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.

TitleScalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.
Publication TypeJournal Article
Year of Publication2023
AuthorsKolmogorov, M, Billingsley, KJ, Mastoras, M, Meredith, M, Monlong, J, Lorig-Roach, R, Asri, M, Jerez, PAlvarez, Malik, L, Dewan, R, Reed, X, Genner, RM, Daida, K, Behera, S, Shafin, K, Pesout, T, Prabakaran, J, Carnevali, P, Yang, J, Rhie, A, Scholz, SW, Traynor, BJ, Miga, KH, Jain, M, Timp, W, Phillippy, AM, Chaisson, M, Sedlazeck, FJ, Blauwendraat, C, Paten, B
Corporate AuthorsNorth American Brain Expression Consortium (NABEC)
JournalbioRxiv
Date Published2023 Apr 05
Abstract

Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.

DOI10.1101/2023.01.12.523790
Alternate JournalbioRxiv
PubMed ID36711673
PubMed Central IDPMC9882142
Grant ListU01 HG010961 / HG / NHGRI NIH HHS / United States
P01 AG000538 / AG / NIA NIH HHS / United States
P30 AG072980 / AG / NIA NIH HHS / United States
OT3 HL142481 / HL / NHLBI NIH HHS / United States
OT2 OD033761 / OD / NIH HHS / United States
R01 HG011274 / HG / NHGRI NIH HHS / United States
U24 HG010262 / HG / NHGRI NIH HHS / United States
U24 HG011853 / HG / NHGRI NIH HHS / United States
ZIA NS003154 / ImNIH / Intramural NIH HHS / United States
R01 HG010485 / HG / NHGRI NIH HHS / United States
U24 NS072026 / NS / NINDS NIH HHS / United States
P30 AG019610 / AG / NIA NIH HHS / United States