A powerful analytic tool created specifically to empower biologists and other researchers to easily perform complex analyses and visualizations on genomic and phenotypic data.

Features

SNP & Variation Suite is a powerful analytic tool created specifically to empower biologists and other researchers to easily perform complex analyses and visualizations on genomic and phenotypic data. With SVS you can focus on your research instead of learning to be a programmer or waiting in line for bioinformaticians.

Numeric Analysis Methods
  • Principal component analysis for integer or quantitative data
  • Wave detection/correction
  • Matched pairs T-Test
  • Fisher's Exact Test for binary predictors and a binary dependent variable
  • Derivative log ratio spread
  • Percentile-based Winsorizing
  • Segmentation of log-ratio data to detect copy number regions
  • Standard sample statistics to summarize columns or rows of data
Data Management
  • Display p-value results, raw data and annotation sources all in the same view
  • Natural pan and zoom controls quickly allow you to zero in on a region of interest
  • A smart labeling system balances clarity with information density
Support & Extensibility
  • Technical manual with methods fully documented and explained
  • Customer support available by phone and e-mail
  • Training available on live web demonstrations
Genomic Visulization
  • Efficiently handle micro-array and whole-exome data for thousands of samples on a desktop computer
  • Scales to whole-genome and imputed datasets

Use Cases

GWAS

GWAS continues to be an effective method for identifying disease susceptible genes in humans and other organisms. SNP & Variation Suite empowers users to run basic and advanced SNP analyses, incorporating a number of intuitive workflows to lead you beyond single marker associations.

  • Powerful Genotype Association Testing and Statistics
    VS offers a powerful and straightforward way of testing for genotypic association against either dichotomous or quantitative traits using one or more statistical measures under any one of several genetic model assumptions. These tests can be run individually or simultaneously while also correcting for stratification and applying multiple testing corrections (including permutation testing).
  • Meta-Analysis
    Meta-Analysis takes the results of two or more GWAS studies for multiple SNPs or markers, and standard meta-analysis statistics are then performed on each SNP and the results compiled into one spreadsheet. SVS can perform meta-analysis on results created within the SVS software or from third-party software programs or a combination of the two. Results for a fixed-effects model, random-effects model and tests for heterogeneity between studies are automatically computed for every meta-analysis performed.
  • Linkage Disequilibrium and Haplotype Analysis
    Interactively explore linkage disequilibrium (LD) and haplotypes in an innovative and powerful interface. You can view LD plots from one or more populations and explore them side-by-side with association results. For haplotype analysis it is easy to define and modify haplotype blocks from an LD plot or spreadsheet, compute haplotype and diplotype frequency tables, and perform a number of haplotype association tests and trend regression, including per-block and per-haplotype methods.
  • Regression Analysis
    SVS incorporates advanced regression technologies that enable you to perform linear and logistic regression, stepwise regression (both backward elimination and forward selection), gene by environment interatction regression, and permutation tests with numeric variables and recoded genotypes. You can use a moving window along with numeric or categorical covariates, against a single dependent variable. Regressions may either be performed with all variables and covariates together ("full model") or with some of the covariates grouped into a "reduced model" (yielding a full-vs-reduced model p-value).

Genomic Prediction

From obtaining allele substitution values to building predictive models, SNP & Variation Suite has all the tools for genomic prediction and visualization. Compare and contrast results using the available methods or pick your favorite method. Covariates can be included in every analysis and X-Chromosome correction is also available. SVS simplifies the entire genomic prediction process from data management to model building to visualization.

  • Genomic Prediction Methods
    Methods available in SVS include Genomic Best Linear Unbiased Predictors (GBLUP), Bayes C and Bayes C-Pi. These tools create and find a solution to, or an approximate solution to, one or more sets of mixed linear model equations. The genomic information from the samples is included in every model to obtain a "genomic prediction". Given the available dataset, genomic prediction methods can be used to build a prediction model that explains the association between the genotypes (genetic data) and the phenotype information best. This model can then be used in research to better understand the phenotype, and in commercial applications to improve decision making.
  • K-Fold Cross Validation
    Automatically build training and validation sets within SVS using K-Fold Cross Validation. Account for stratification when picking the samples for each set to ensure balanced sets to obtain the best prediction models. Then run genomic prediction for one or more genomic prediction methods directly from K-Fold Cross Validation to save time and mouse clicks. SVS's K-Fold Cross Validation will also ensure major and minor alleles are consistently encoded through each data subset to ensure consistent direction of effect.
  • Tumor/Normal Workflows
    VarSeq provides complete support for Tumor Normal workflows. Samples can be imported as matched pairs, allowing germ line variants to be filtered out in a single step. Multiple paired samples can be imported in a single project enabling fast and accurate analysis in settings where reproducibility is critical.
  • Applying a Prediction Model to New Data
    After building a model, apply it to a new dataset to predict the phenotype. If the phenotype values are known this can be used to validate the model. If unknown, this can be used to make decisions based on the genetic data for the samples without phenotype information based on the samples used to build the prediction model. SVS automatically adjusts for strand information to ensure consistent direction of effect between the model used for prediction and the dataset the model is applied to.
  • Visulization
    Visualize the predicted versus actual phenotypes in a cluster plot to gauge the accuracy of the prediction model. Getting to a scatter plot with a trend line is straightforward and you can color the data points by any covariates or by a stratifying variable. The normalized log-transformed allele substitution values are genomic data and as with all genomic data in SNP & Variation Suite, plotting these values with GenomeBrowse provides you with the genomic context to interpret the markers with the largest influence in the prediction model to interpret key genes. Our live-streaming annotation repository as well as custom annotations for dozens of species can help decipher the significance of any results in the context of your research.

Imputation

Impute missing or incomplete genotypes in your GWAS workflows with SVS's adaptation of the mature BEAGLE 4.1 algorithm that is designed to scale to tens of thousands of samples and whole genome sequencing variation density.

  • Human & Animal Genomics
    If you are studying human populations, we provide publicly available subsets of pre-phased 1000 Genomes phased genotypes subsetted down to useful frequencies to be used for imputation:

    • 5% allele frequency or greater (8.5 million variants)
    • 1% allele frequency or greater (14.2 million variants)
    • Allele count greater than 20 (~0.4% with 19.5 million variants)

    Another common use case in both human and agrigenomics involves imputing from one genotype array up to a reference panel with a higher marker density. Among other things, this allows you to leverage data from multiple GWAS conducted on different micro-array platforms. In this case, you are able to use our Create Imputation Reference Panel tool to create your own phased reference panel dataset you can use for imputation of your own data. Golden Helix SVS provides full support for non-human genomes and imputation also works for any species under study.
  • System Requirements
    The imputation capability is provided as part of an SVS Server license.The recommended minimum machine requirement to run SVS on a server with imputation is an 8 core machine with 16GB of RAM. The imputation program is multi-threaded and automatically detects the number of available CPU cores. Runtime is directly correlated to the number of CPU cores and so large impute jobs will benefit from having as many CPU cores as possible on a single server.

Large Sample DNA-Sequencing Analysis

SNP & Variation Suite includes rare variant analysis tools with region-based collapsing methods for whole-genome and whole-exome DNA next-generation sequencing. For the first time, in a single, integrated desktop solution, you can perform standard variant association workflows for quality assurance and association analysis on hundreds to millions of common and rare variants for thousands of samples.

  • Data Import and Management
    Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently cataloged, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats such as variant call files (VCF) version 4.0 and higher. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they can be imported as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
  • Variant Classification
    Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frame-shifting, etc). This gives insight into which variants are most likely to have functional effects.
  • Quality Assurance
    SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are accurate. Standard quality assurance measures for small sample or small family exome or whole genome workflows are supported. Including screening out variants with poor read depths and other quality scores from the variant call files, presence (or absence) in public annotation databases, minor allele (alternate allele frequency) filtering based on public catalogs, and having an effect on the protein coding. For small families, Mendelian error detection is also available.
  • Variant Filtering
    After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP, 1000 Genomes, NHLBI ESP6500 Exomes, and ClinVar enable you to exclude variants considered common. The dbNSFP NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP and more. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
  • Rare Variant Burden and Association Testing
    SVS employs several collapsing methods that enable you to perform association testing with your sequence data. The simplest method creates a binary covariate per gene whereby each sample is assigned a one or zero based on the presence or absence of at least one rare variant in each gene. A slightly more sophisticated approach creates an integer covariate for each gene by counting the number of variants for a given sample in each gene. Using the software's powerful numeric association testing and regression analysis capabilities, you can then perform association testing with these gene-based covariates.
  • Variant Frequency by MAF
    Due to cost, most next-generation studies thus far have involved a relatively small number of samples compared to traditional GWAS studies. This makes it difficult to calculate in-sample minor allele frequency (MAF) to identify how rare a variant is. Variant Frequency Binning by MAF uses the MAFs of an external reference population to classify the variants in your own samples in terms of rarity.

RNA-Sequencing Analysis

SNP & Variation Suite offers advanced analysis tools designed to perform differential expression workflows for RNA expression profiling experiments. Regardless of the upstream secondary analysis tool used to align and quantify reads into weighted counts, SVS provides all the data normalization, differential expression, and visualization techniques needed to be able to conduct RNA sequencing analysis quickly and easily, giving you everything you might expect from expression micro-arrays.

  • DESeq Analysis
    Taking advantage of analysis techniques developed by Anders and Huber 2010, the DESeq tool is designed to estimate variance-mean dependence in count data and test for differential expression between types using a model based on the negative binomial distribution. DESeq in SVS not only calculates the mean values from your genes or transcripts for each group, but also detects the squared coefficient of variation (SCV). This approach helps to recognize those transcripts with the highest consistency by providing p-values and fold change between each study group while filtering out erratic variations found within certain transcripts.
  • Normalization and Log Transformation
    Various aspects of the RNA-Seq sample preparation and sequencing process can result in extremely high variance of read counts within a sample and between a sample, even when each sample is sequenced with the same target depth. While DESeq has a built in normalization method, you can also normalize your data as outlined by Bullard et al. 2010. This normalized data can then be used in PCA analysis to see if your biological factors are driving the primary principle components or to run association analysis with some of our many supported statistical tests such as T-Test and regression with optional covariates.
  • Visualization
    Advanced visualization can be used to interpret the analysis of your RNA-seq differential expression. Getting to a standard volcano plot showing p-values versus fold-change is a cinch. And you can interactively set thresholds on the data and see what genes show statistical significance and large-magnitude count differences. Your top genes in their normalized form are output from DESeq and can be hierarchically clustered and plotted in a heatmap. The dendrogram on both the sample and gene axes provide clear feedback that the undirected clustering followed the biological grouping and the statistic test provided genes with stark differences in expression between groups.

Small Sample DNA-Seq Workflows

SNP & Variation Suite delivers the most powerful rare variant filtering workflows with the latest annotation sources and GenomeBrowse visualization. For the first time, in a single, integrated desktop solution, you can interactively filter hundreds to millions of common and rare variants down to a handful of potentially pathogenic variants.

    • Data Import and Management
      Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently cataloged, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats such as variant call files (VCF) version 4.0 and higher. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they can be imported as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
    • Quality Assurance
      SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are accurate. Standard quality assurance measures for small sample or small family exome or whole genome workflows are supported. Including screening out variants with poor read depths and other quality scores from the variant call files, presence (or absence) in public annotation databases, minor allele (alternate allele frequency) filtering based on public catalogs, and having an effect on the protein coding. For small families, Mendelian error detection is also available.
    • Variant Classification
      Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frame-shifting, etc). This gives insight into which variants are most likely to have functional effects.
    • Variant Filtering
      After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. The 1000 Genomes Project has already cataloged more than 25 million variants, 18 million more than dbSNP 135 (the closest thing to a database for common variants). Each person is expected to have roughly 4 million variants, 20 thousand in coding regions, and 250-300 that are potentially damaging. How do you distinguish the relatively small number of damaging variants from those that are benign? SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP, 1000 Genomes, NHLBI ESP6500 Exomes, and ClinVar enable you to exclude variants considered common. The dbNSFP NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP and more. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.

    Taking advantage of analysis techniques developed by Anders and Huber 2010, the DESeq tool is designed to estimate variance-mean dependence in count data and test for differential expression between types using a model based on the negative binomial distribution. DESeq in SVS not only calculates the mean values from your genes or transcripts for each group, but also detects the squared coefficient of variation (SCV). This approach helps to recognize those transcripts with the highest consistency by providing p-values and fold change between each study group while filtering out erratic variations found within certain transcripts.
  • Normalization and Log Transformation
    Various aspects of the RNA-Seq sample preparation and sequencing process can result in extremely high variance of read counts within a sample and between a sample, even when each sample is sequenced with the same target depth. While DESeq has a built in normalization method, you can also normalize your data as outlined by Bullard et al. 2010. This normalized data can then be used in PCA analysis to see if your biological factors are driving the primary principle components or to run association analysis with some of our many supported statistical tests such as T-Test and regression with optional covariates.
  • Visualization
    Advanced visualization can be used to interpret the analysis of your RNA-seq differential expression. Getting to a standard volcano plot showing p-values versus fold-change is a cinch. And you can interactively set thresholds on the data and see what genes show statistical significance and large-magnitude count differences. Your top genes in their normalized form are output from DESeq and can be hierarchically clustered and plotted in a heatmap. The dendrogram on both the sample and gene axes provide clear feedback that the undirected clustering followed the biological grouping and the statistic test provided genes with stark differences in expression between groups.

Copy Number Analysis

SNP & Variation Suite offers a complete set of tools for processing raw intensity data, identifying regions of copy number variation (CNV), visualizing copy number data, and performing association analyses on a variety of copy number covariates. From cytogenetic research to genome-wide copy number association from micro-arrays, SVS delivers a powerful toolset for correlating common and rare chromosomal aberrations with disease.

  • Data Processing
    SVS offers direct import of log ratio data from a number of providers including Affymetrix, Agilent, NimbleGen, and Illumina. For Affymetrix CEL files (500K, 5.0, and 6.0), a powerful processing tool enables you to run quantile normalization on the A and B probe intensities, including virtual array generation to merge CN and SNP probes or multiple arrays (e.g. NSP and STY). This process scales to thousands of samples and can use any sample set as a reference.
  • CNV Association Testing
    A number of covariate generation procedures enable you to perform association testing on raw or PCA-corrected log ratios, CNV segment means, and discretized values based on three- and two-state models representing loss, neutral, and gain. Perform numeric association tests or advanced linear and logistic regression with CNV covariates alone or in combination with other genetic markers and phenotypic variables.
  • Copy Number Detection with Optimal Segmenting
    SVS employs a powerful optimal segmenting algorithm called Copy Number Analysis Method (CNAM) using dynamic programming to detect inherited and de novo CNVs on a per-sample (univariate) and multi-sample (multivariate) basis. Unlike Hidden Markov Models, which assume the means of different copy number states are consistent, optimal segmenting properly delineates CNV boundaries in the presence of mosaicism, even at a single probe level, and with controllable sensitivity and false discovery rate. Optimal segmenting incorporates a parallelized, unbiased randomization permutation procedure that uses all available cores on your computer. The permutation procedure replaces a na?ve, potentially biased randomization procedure with the unbiased Fisher and Yates method (also known as the Knuth shuffle). An added option allows you to further refine your segments by efficiently removing univariate outliers during the segmentation process.
  • Detecting and Correcting for Plate/Batch Effects, Genomic Waves, and other Quality Issues
    For both micro-array and aCGH data, significant bias can be introduced by batch effects (plate, machine, and site variation), genomics waves, and population stratification. Other sources of variation include sample extraction and preparation procedures, cell types, temperature fluctuation, and even ambient ozone levels in a lab. These can lead to complications ranging from poorly defined segments to false and non-replicable findings. SVS offers a number of tools to not only detect for these data quality problems but correct for them as well.

Case Studies

We know our software will exceed your expectations. But don't just take it from us, see what our customers have benefitted from it.

Recommended Learning Materials

We have a variety of supplemental learning materials that are an excellent resource for anyone interested in the industry or our software solutions. Here are some of our recommended materials for you to check out related to SVS!

eBooks

Check out our eBooks on a variety of interesting topics.

Other Resources

Explore a clinical workflow in the VarSeq or follow along with a tutorial!

SVS Viewer:
Download Here


Introduction to SVS:
Download Here

Evaluation

Request a free trial of SVS:

Please enter your first name
Please enter your last name
Please enter a name
Please enter a valid phone
Please enter a valid email address
Please select your country
Please select your state

Stay updated with exclusive eBooks, timely invitations to webcasts and events, andother communications from Golden Helix.

Technical Specifications

GENERAL PURPOSE HARDWARE REQUIREMENTS

4 GB of RAM

Multicore CPU

100GB of space available for annotations and projects

ADVANCED AND WHOLE GENOME WORKFLOW HARDWARE REQUIREMENTS

If you are working with whole exomes or genomes, especially if or hundreds to thousands of samples, we suggest a high-memory configuration and plenty of storage capacity:

16GB+ of RAM (32GB for Servers)

8+ CPU Cores

1TB of space available for annotations and projects

OPERATING SYSTEMS

The following operating systems are supported:

64-bit Windows 7 or later (32-bit also supported, but not recommended)

Linux Ubuntu 14.04 or later (64-bit only)

Linux RHEL 6 or later, or equivalently CentOS 6 or later (64-bit only)

Mac OS X 10.9 or later

SERVER CONFIGURATIONS

With a server license, you can install your Golden Helix software solution on a server with multi-user access and shared resources. You can launch any number of instances of the software on the same host, and are only limited by the natural CPU, Memory and Disk resources of the server.

For Windows, you would need to use ability for multi-user Remote Desktop only available on Windows Server. We support Windows Server 2008 or newer.

On Linux, clients can log in from any operating system using SSH and open the Golden Helix software using X11-tunneling to interact with the software. On windows, we suggest a solution like MobaXterm that provides a all-in-one SSH client and X11 server to enable easy logging in, file transfer and opening of remote GUI applications.

PROXY SETTINGS, FIREWALLS AND ANTIVIRUS

Golden Helix VarSeq and SVS can be configured to access the internet through a SOCKS5 or HTTP/HTTPS Tunneling Proxy. Go to Tools -> Proxy Settings… to configure.

The software only needs to make outgoing connections on standard HTTP/HTTPS ports and protocols. If a local firewall is installed that prevents these types of outgoing connections (this is very uncommon), firewall rules will need to be created to whitelist the software.

Note we have run into numerous issues where aggressive anti-virus programs prevent the product from performing normal operations such as opening files and logging in. You may need to whitelist Golden Helix executables or disable these tools to perform your analytics.