Benchmarking challenging small variants with linked and long reads.

Title	Benchmarking challenging small variants with linked and long reads.
Publication Type	Journal Article
Year of Publication	2022
Authors	Wagner, J, Olson, ND, Harris, L, Khan, Z, Farek, J, Mahmoud, M, Stankovic, A, Kovacevic, V, Yoo, B, Miller, N, Rosenfeld, JA, Ni, B, Zarate, S, Kirsche, M, Aganezov, S, Schatz, MC, Narzisi, G, Byrska-Bishop, M, Clarke, W, Evani, US, Markello, C, Shafin, K, Zhou, X, Sidow, A, Bansal, V, Ebert, P, Marschall, T, Lansdorp, P, Hanlon, V, Mattsson, C-A, Barrio, AMartinez, Fiddes, IT, Xiao, C, Fungtammasan, A, Chin, C-S, Wenger, AM, Rowell, WJ, Sedlazeck, FJ, Carroll, A, Salit, M, Zook, JM
Journal	Cell Genom
Volume	2
Issue	5
Date Published	2022 May
ISSN	2666-979X
Abstract	Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as . For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
DOI	10.1016/j.xgen.2022.100128
Alternate Journal	Cell Genom
PubMed ID	36452119
PubMed Central ID	PMC9706577
Grant List	9999-NIST / ImNIST / Intramural NIST DOC / United States R01 HG010759 / HG / NHGRI NIH HHS / United States