Large Sample DNA-Seq Workflows
SNP & Variation Suite includes rare variant analysis tools with region-based collapsing methods for whole-genome and whole-exome DNA next-generation sequencing. For the first time, in a single, integrated desktop solution, you can perform standard variant association workflows for quality assurance and association analysis on hundreds to millions of common and rare variants for thousands of samples.
Data Import and Management
Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently cataloged, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats such as variant call files (VCF) version 4.0 and higher. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they can be imported as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are accurate. Standard quality assurance measures for small sample or small family exome or whole genome workflows are supported. Including screening out variants with poor read depths and other quality scores from the variant call files, presence (or absence) in public annotation databases, minor allele (alternate allele frequency) filtering based on public catalogs, and having an effect on the protein coding. For small families, Mendelian error detection is also available.
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frame-shifting, etc). This gives insight into which variants are most likely to have functional effects.
After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. The 1000 Genomes Project has already cataloged more than 25 million variants, 18 million more than dbSNP 135 (the closest thing to a database for common variants). Each person is expected to have roughly 4 million variants, 20 thousand in coding regions, and 250-300 that are potentially damaging. How do you distinguish the relatively small number of damaging variants from those that are benign?
SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP, 1000 Genomes, NHLBI ESP6500 Exomes, and ClinVar enable you to exclude variants considered common.
The dbNSFP NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP and more. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
Rare Variant Burden and Association Testing
In rare variant analysis, it's hypothesized that rather than having a single causal variant, multiple variants have a compound effect on the trait of interest, referred to as rare variant burden. Traditional single marker association techniques used in GWAS studies do not have the power to detect rare variants or provide tools for measuring their compound effect. To do this, it is necessary to ?collapse? several variants into a single covariate based on regions such as genes.
SVS employs several collapsing methods that enable you to perform association testing with your sequence data. The simplest method creates a binary covariate per gene whereby each sample is assigned a one or zero based on the presence or absence of at least one rare variant in each gene. A slightly more sophisticated approach creates an integer covariate for each gene by counting the number of variants for a given sample in each gene. Using the software's powerful numeric association testing and regression analysis capabilities, you can then perform association testing with these gene-based covariates.
More advanced methods in SVS are Combination Multivariate and Collapsing (CMC) and Kernel Based Adaptive Cluster (KBAC) by Li and Leal. CMC first bins variants according to a criterion such as minor allele frequency, then collapses the variants within each bin, and finally performs multivariate testing on the counts across the various bins. KBAC differs from CMC in that both variant classification and association testing are unified into a single procedure. KBAC models the risk associated with multi-site genotypes rather than collapsing individual genotypes based on specified bins.
CMC in SVS allows for quantitative phenotypes and both CMC and KBAC are able to correct for covariates and confounders in permutation testing, resulting in fewer false positives. Using one of these approaches will give greater power to detect the significance of rarer variants.
Variant Frequency by MAF
Due to cost, most next-generation studies thus far have involved a relatively small number of samples compared to traditional GWAS studies. This makes it difficult to calculate in-sample minor allele frequency (MAF) to identify how rare a variant is. Variant Frequency Binning by MAF uses the MAFs of an external reference population to classify the variants in your own samples in terms of rarity.
Examining variants in GenomeBrowse
Variant map visualization provides a practical representation of variant call spreadsheets in the context of public annotations including gene and exon regions. With a quick glance at a variant map, where variants can be colored by allele or as an insertion or deletion (e.g. variant classification) researchers can immediately see the results of theirfiltering workflows. Adding annotation sources further illustrates a complete picture of variants, helping you better understand the relevance of variants that may have beenfound to be disease gene related, medically actionable, or potentially deleteriousvariants. Learn more about GenomeBrowse as a stand-alone tool »
Webcast - Population-Based DNA Variant Analysis
Blog Post - Whole Exome Sequencing in VarSeq
We have big data in the field of genomics, yet all that crunching is not the hard part.
Webcast - Using Public Databases to Interpret NGS Variants
Clinical Next-Gen Sequencing Analysis eBook
Clinical Next-Gen Sequencing Analysis
by Dr. Andreas Scherer