Welcome to Version 8 of SNP & Variation Suite (SVS), an integrated collection of user-friendly, yet powerful analytic tools for managing, analyzing, and visualizing multifaceted genomic and phenotypic data. Version 8 of SVS improves on many of the things you love about Version 7 and offers a plethora of new features that truly earn it the distinction of a version upgrade.
In September 2012, the Golden Helix GenomeBrowse® visualization tool was introduced to the genomic community as an intuitive way to view DNA-seq and RNA-seq pile-up and coverage data. As a free, stand-alone product, GenomeBrowse quickly gained popularity and today boasts over 2,500 users.
Since then, users of SNP & Variation Suite who also wanted to see visualizations in GenomeBrowse had to manage with a clunky copy-and-paste workflow to get data from one product to the other. Additionally, while SVS did have a genome browser, it was not able to provide the seamless user experience that GenomeBrowse could.
Version 8 of SNP & Variation Suite takes visualization to a whole new level with the integration of GenomeBrowse for viewing any data in genomic space inside of SVS. Import your data once and have it at your fingertips for filtering, quality assurance, analysis, and then visualization with just one click. If you haven't yet tried GenomeBrowse, expect a fluid interface with easy-to-use controls such as zooming via the mouse scroll wheel as well as powerful options for changing the view like the ability to set the y-axis.
And when you are ready to publish, save your GenomeBrowse window as an image in a variety of file formats including PNG, JPG, and TIF.
With the release of SVS Version 8, annotation tracks will now be in a completely re-envisioned file format: TSF (replacing the old IDF file format). While the details are pretty cool, what you really need to know is that TSF gives the user more efficient data storage (at times up to 80% smaller file sizes) while providing an expanded number of field types and integrated field and source level documentation.
The conversion to TSF file format gave us the opportunity to create an all new data conversion wizard that can accept, churn, and spit out almost anything you throw at it. The benefit? The ability to make a custom annotation track in no time flat. (Our in-house data curation team loves it.)
n the subject of convenience, Version 8 also introduces a new Data Source Library dialog that allows you to add locations, organize, and manage data sources whether they be local, via a network, or cloud-based. Easy-to-use drop-downs and search capabilities allows the user to find what they are looking for quickly and painlessly. As an added bonus, downloading annotation tracks or public data from the Golden Helix server (or your own network) can now run in the background so that you can continue your analysis in SVS and not have the application unusable while you wait.
Also now improved is the ability to use network sources for analysis and filtering, especially useful for DNA-seq data. In Version 7, you had to download these files before being able to use them in your workflow; eliminating this requirement saves the user time (and hard disk space!).
Another sequencing workflow improvement in Version 8 is being able to filter on BED files directly without having to convert it to another file format. Simple and easy!
Version 8 introduces Haplotype Trend Regression to the family of analysis options available in SVS. Haplotype Trend Regression takes one or more block(s) of genotypic markers and for each block of markers, estimates haplotypes for these markers, then, for all (or all but one) the haplotypes above the frequency threshold, regresses their haplotype probabilities against a dependent variable for each sample. Haplotype Trend Regression allows you to associate a disease or other phenotype with haplotype frequencies of individuals with just a few clicks.
The regression may be linear or logistic, may be stepwise if desired, and may involve fixed numeric or categorical covariates and/or interaction terms. The fixed covariates and interaction terms may either be regressed together with the by-sample haplotype probabilities ("full model") or may be grouped separately into a "reduced model." Permutation testing is also available.
Also added to SVS Version 8 is the Fixation Index Fst by marker. Fst, in general, measures the amount of genetic divergence between two or more subpopulations from an ancestral population. Estimates made using all markers for each sample has been available in SVS, but now a user has the option to compute Fst by marker to determine if a particular SNP is indicative of population variation. All that is required to run Fst is genotypic data and a categorical grouping variable for indicating subpopulation.
The DNA-Seq package of SVS has also received several new additions including:
As SVS's menus grew organically through added features and options, obtuseness increased. Version 8 introduces a reorganization of the menus in intuitive groupings that result in easier learning of the product for newbies and less "now-where-was-that-function-again" for SVS experts.
As anyone who uses SVS knows, managing and handling windows can become a big deal as each step of a workflow often creates a separate spreadsheet. Window management has now been added to SVS's project navigator, which allows for easy switching of windows, closing all plot or GenomeBrowse windows, closing all spreadsheet windows, or closing all windows of any type.
Specific to streamlining the DNA-Seq package, numerous, separate tools for annotating and filtering variants have been merged into one simple multi-step dialog allowing the user to select multiple tracks at once and choose for each whether the dataset will be filtered or simply annotated. When using more than one annotation track, the user can also select which order the dataset is filtered/annotated in to further fine-tune the results.
New in 7.7
Version seven of SNP & Variation Suite (SVS) 7 includes an improved RNA-Seq Analysis Package for mRNA expression profiling, streamlined DNA-Seq workflows, and much more. Enjoy!
While the RNA-Seq Package of SVS was introduced in version 7.6, its functionality was limited. In version 7.7, the RNA-Seq Package has been expanded to provide an end-to-end solution for tertiary analysis.
Regardless of the upstream secondary analysis tool used to align and quantify reads into weighted counts, SVS provides all the data normalization, differential expression, and visualization techniques needed to be able to conduct RNA sequencing analysis quickly and easily, giving you everything you might expect from expression microarrays and more.
A core part of the RNA-Seq Package, the DESeq tool is designed to estimate variance-mean dependence in count data and test for differential expression between types using a model based on the negative binomial distribution. SVS has the latest version of DESeq implemented.
After normalizing via DESeq or through the upper-quartile log transformation function, advanced visualization can be used to interpret the analysis. Getting to a standard volcano plot showing p-values versus fold-change is a cinch, and your top genes can be hierarchically clustered and plotted in a heatmap.
To achieve these results, we added streamlined filtering functionality such as Filter on Non-Synonymous Functional Predictions, Filter on Variant Frequency Catalog, and others. We also expanded our capability to do basic variant quality based filters in one step with the Set Genotypes to No-Call based on Additional Spreadsheets tool.
SVS now also offers the ability to count mapped genotypic columns in common between spreadsheets. If two or three spreadsheets are selected for comparison, a Venn diagram will be created for visual inspection.
The complete trio analysis first introduced by Dr. Kenneth Kaufman in August 2012 was a benchmark for using SVS to filter sequence data. Now it's even better, having been streamlined to reduce the time it takes to filter to a short list of causal variants from two hours to 20 minutes or less per trio.
One of the biggest hassles for our customers working with small families was creating a pedigree file when they did not have one. To address this, we developed a tool to build a pedigree based on the row labels of a spreadsheet, quickly and easily.
The DNA-Seq Package of SVS can now be used for the following applications:
There are many more additions to SVS 7.7; here are a few of the notable ones:
The SVS welcome screen got a facelift with an area reserved for Technical Support Bulletins. This feed allows users to stay on top of support announcements, bug fixes, and new scripts for SVS.
SVS 7.7 also has an improved CGI import tool that allows you the choice of how to represent variants that are not called for a particular sample in the same manner as the VCF import tool. Also, the Illumina Final Report importer has been completely rewritten to be faster, more computationally efficient, and it now allows you to import unmapped data, or data with a position of zero.
Finally, you can now add color flags to nodes in the Project Navigator to make a visual placeholder. Flags come in four colors and also indicate if a node has a note attached to it. Flags also make it easy for you to indicate for collaborators (or even yourself!) which spreadsheet you used to start your analysis and which contains the final results.
New in 7.6
Version six of the venerable SNP & Variation Suite (SVS) 7 includes even more improvements to DNA-Seq analysis capabilities, new configurations of the software to meet your exact needs, RNA-sequencing analysis, and, for the first time since the release of SVS 7, new plotting types. Enjoy!
SVS 7.4 brought you sequence analysis. Version 7.5 brought improved importing tools, variant map visualization, variant classification, and more robust NGS analysis methods. Version 7.6 brings you even more.
Many of the early success stories in NGS analysis have been studies of small nuclear families, including family trios with Mendelian diseases. SVS 7.6 includes several new tools to assist the researchers pursuing similar small-family research designs.
If a rare autosomal recessive disease model is suspected, the DNA-Seq Package has a method called Score Variants by Recessive Model that will help rank rare variants that match the expected inheritance pattern between affected and optionally unaffected individuals.
Beyond supporting analysis of rare homozygous mutations, SVS also provides the ability to detect and score genes with compound heterozygous mutations in trios. These loci may cause a recessive trait when multiple damaging variants within a gene are inherited from each parent. For quality assurance purposes, SVS now offers a tool for detecting Mendelian inheritance errors in families. The Mendelian error tool is also useful for identifying possible de novo mutations.
These tools are now included in the DNA-Seq package and will enable users to work with their familial data without having to purchase PBAT or venture outside of the SVS ecosystem.
GWAS studies typically demand a great number of cases and controls to conduct association testing. Sequencing studies, on the other hand, often don't have sequenced control samples and thus rely on reference genomes to determine significance. Genotype association testing, filtering, recoding, and statistics can now be performed in SVS based on either the traditional major/minor allele or on a reference/alternate allele classification as determined by a Reference Genome build. Researchers without a large number of controls in their data can thus utilize the same useful utilities employed in large case/control studies by relying on the commonality of a reference genome.
With the addition of an RNA-Seq package, a data management and visualization package, and a complete feature and functionality package, SVS now offers any researcher the ability to get the exact configuration they need to perform the fastest and most streamlined genetic research analysis they ever thought possible.
As researchers study the whole picture of genetic diseases and other traits, new methods and approaches are often needed as the field evolves. The sixth installment of SVS 7 marks Golden Helix’s expansion into RNA sequencing analysis with the introduction of a toolset similar to the DESeq ‘R’ package developed by Simon Anders and Wolfgang Huber in 2010. The SVS RNA-Seq package performs differential expression analysis on RNA sequence count data, which is used to find genes that are being over- or under-expressed in a given set of samples relative to a set of controls. Now using SVS, researchers can conduct both DNA and RNA analysis without the hassle of managing different software and file types.
The offering of an SVS RNA-Seq package is a prelude to the release of the Golden Helix and EA pipeline and will offer users the ability to analyze their RNA data hosted on the cloud with ease.
SVS is now offered in packages specifically designed around the intended application with all functionality necessary for that application included. This bundling of SVS gives purchasers the peace of mind, knowing that all key functionality for the type of work they do, comes with the package.
Two new configurations of SVS give users more options in choosing how to analyze their data.
SVS Core Plus is the foundational package of SVS. It offers a robust platform for efficiently managing, manipulating, and visualizing large genomic datasets, regardless of where the data came from or where analysis took place.
The Power Seat contains the complete collection of SVS features and functionality from all of the packages, with the exception of PBAT. It also includes all new methods as they are released.
Sometimes just the right plot is all that is needed to understand an idiosyncrasy of your data, decide where to explore next, or communicate a concept with a colleague. SVS 7.6 includes the integration of the Python Matplotlib library, which will allow new plots types to be added in the future quickly and easily. To begin, four new plot types were added including NxN scatter plots, plot proportion by group with confidence intervals, side by side box plots, and stacked histograms.
An NxN scatter plot allows for multi-dimensional PCA analysis.
To view the distribution of numerous variables simultaneously, use a side by side box plot.
New in 7.5
The fifth installment of SNP & Variation Suite (SVS) 7 fills out the Sequence Analysis Module premiered in version 7.4, giving you more ways to explore and analyze your NGS data to identify variants that matter. The Genome Browser has also received a lot of attention, including the addition of a global and track-based search feature and the ability to immediately visualize the differences among two or more groups by grouping, filtering, and splitting graphs. Enjoy!
SVS 7.4 brought you an entirely new Sequence Analysis Module with the latest advances in tertiary or "sense making" analysis methods for whole-genome and whole-exome DNA next-generation sequencing. Version 7.5 makes it even better.
While version 7.4 introduced the ability to import VCF files standardized by the 1000 Genomes project, SVS 7.5 can now more efficiently import a wider variety of VCF files as well as variant files from Complete Genomics. Furthermore, the import tool has been overhauled to allow for the combination of file formats from various sources without having to worry about compatibility.
After importing, you may want to use the new bi-allelic expansion tool to encode multi-allelic variants (variants with two or more alternate alleles present at a given locus) as multiple bi-allelic variants. Because the mature workflows developed for microarray based genotype analysis only support bi-allelic columns, multi-allelic variants would be ignored or dropped in the past.
SVS can now expand the genotype columns into a single column for each alternate allele present in all samples at a given locus, allowing for more comprehensive analysis of NGS data.
Variant map visualization provides a practical representation of large genotype spreadsheets in the context of sequencing variant analysis. With a quick glance at a variant map where variants can be colored by allele or any categorical variable (e.g. variant classification) researchers can immediately see areas where samples groups differ, indicating a possible site for further analysis. Adding annotation tracks further illustrates a complete picture of variants, helping you better understand the relevance of significant findings.
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frameshifting, etc). This gives insight into which variants are most likely to have functional effects.
Combination Multivariate and Collapsing (CMC) was introduced in SVS in February as an advanced method for analyzing NGS data. Li and Leal's second method, Kernel Based Adaptive Clustering (KBAC), has now also been added. KBAC differs from CMC in that both variant classification and association testing are unified into a single procedure. KBAC models the risk associated with multi-site genotypes rather than collapsing individual genotypes based on specified bins.
But we took them both one step further. Working in conjunction with Baylor College of Medicine, the Golden Helix product team implemented both CMC and KBAC in SVS in a regression framework, allowing for quantitative phenotypes and the correction of covariates and confounders. These new versions are more powerful and result in less false positives through the use of permutation testing. Using either of these approaches will give greater power to detect the significance of rarer variants.
A cornerstone of SVS, the Genome Browser received two substantial upgrades in version 7.5 as well as the variant maps discussed above.
In the age of Google, users now demand powerful searching capabilities in every tool they use. Responding to this cry, our engineers built a powerful search engine into the ever-more-powerful Genome Browser. Search for your favorite gene, and you're right there. Search for an RS id, and you're right there. Search for a reference in an annotation track, and you're right there. Navigation has never been so fun.
With SVS 7.5, users will be able to group and filter variables dynamically "on the fly" to look at multiple dimensions of specificity and perform quality assurance on their data. While users can currently filter in the Genome Browser, it is limited to one variable at a time. Version 7.5 eliminates this constraint to empower visualization to the nth level, saving you time and allowing for a more exploratory experience as you can cluster, sort, categorize, and dig in based on the results you are seeing in real time.
New in 7.4
With over 30 new features, the fourth installment of SNP & Variation Suite 7 empowers you to explore your data as never before. Identify rare causal variants with a new Sequence Analysis Module. Improve your GWAS and CNV results with state-of-the-art workflows. Dramatically increase your productivity. And gain greater insights with the most advanced genome browser. This is just a taste of what you'll discover on your way to more impactful research. Enjoy!
|Click to see what's new|
|New state-of-the-art GWAS & CNV quality||Genome browser and annotation tracks||GPU accelerated
copy number analysis
|On-demand advanced method development|
SVS 7.4 brings you an entirely new Sequence Analysis Module with the latest advances in tertiary or "sense making" analysis methods for whole-genome and whole-exome DNA next-generation sequencing. For the first time in a single, integrated desktop solution, you will be able to quickly analyze millions of common and rare variants from tens to thousands of samples to assess their impact on inherited traits.
Zoomed in view of the human reference genome and SIFT Prediction track.
Targeted resequencing, whole-exome, or whole-genome. It doesn't matter. With SVS you can effeciently manage, analyze, and interactively explore millions of variants for thousands of samples.
SVS makes importing variant calls easy with streamlined import and mapping of the most common and standardized formats, such as variant call files (VCF), and SoapSNP from the Beijing Genome Institute.
Using genomic annotation tracks such as dbSNP, SIFT, 1000 Genomes, and more, SVS enables you to quickly and easily sift through millions of variants to filter out those that are common, benign, poorly covered, or don’t matter for your study. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
SVS gives you the power to assess the impact of rare variants on your trait of interest when traditional association techniques don't apply. Find genes or regions with an abundance of variants in your sample set. Classify the rarity of variants when your sample size is too small to calculate in-sample minor allele frequency. Assess rare variant burden using powerful collapsing and association methods, including the Combined Multivariate Collapsing (CMC) method from Li and Leal. And understand the contribution of rare variants with functional prediction.
Laurie, et. al. from Bruce Weir's group at the University of Washington recently published a definitive paper on quality control and quality assurance methods in genome-wide association studies. We challenged ourselves to provide you with every method they covered. We did that and then some. In addition to the already comprehensive quality assurance procedures available in SVS, here's what's new in 7.4.
Heat map of Identity by Descent matrix sorted by bovine breed.
Related individuals wreak havoc on association tests where independence is assumed. Identity by descent (right) and inbreeding coefficient calculations help you control for unknown or cryptic relatedness in your samples.
To obtain better results when running certain tests you can quickly filter (prune) correlated markers prior to analysis.
Identifying outliers in autosome heterozygosity helps detect contaminated DNA samples (and population stratification in some cases).
Several new methods make it easy to verify that a sample’s reported gender is consistent with its inferred gender. These include X chromosome heterozygosity on genotypes, plotting X versus Y intensity values and averaging log ratio values of the X chromosome (especially helpful for identifying gender anomalies).
PCA plot of study population with outliers identified in green based on
multidimensional outlier detection.
Calculating the inter-quartile range (IQR) of a numeric distribution is useful for determining outliers for many quality assurance measurements.
An extension of quartile summary statistics, you can use this feature to identify outliers on multiple dimensions, such as samples whose ethnicity does not match that of your study population when examining two or more principal components.
Derivative log ratio spread (DLRS) is a measurement of point-to-point consistency or noisiness in log ratio (LR) data. It correlates with low call rates and over/under abundance of identified copy number segments. Samples with higher values of DLRS tend to have poor signal-to-noise properties and are good candidates to exclude from analysis.
Detecting large chromosomal aberrations is both a quality assurance step and an analysis step. For example, by averaging log ratios across all autosomal chromosomes you can quickly detect cell line artifacts. But you may also be able to detect large aberrations that are instrumental in detecting disease causing loci.
Comparing "good" log ratios (top) versus log ratios with a wave effect (bottom).
Genomic waves are ubiquitous in copy number data and can cause inaccuracies with any copy number detection algorithm. SVS employs the Diskin, et. al., 2008 method to help you both detect and correct for genomic waves.
Percentile-based winsorizing can be used to prevent segmentation algorithms from being driven by outlier values, resulting in a more accurate determination of regions of copy number variation.
Major enhancements to key copy number analysis workflows help you get the most accurate and informative results significantly faster than before.
By using your computer’s video graphics card, which acts like a mini compute cluster for a fraction of the price, CNV segmenting that used to take hours or days can now be completed in minutes - without compromising accuracy. Internal benchmark tests have shown 5-20x speed increases for univariate segmenting using a GPU over the CPU. Even more exciting is the 10-100x speed increase for multivariate segmenting, which admittedly, was nearly impossible to use before.
Let's face it. Importing Affymetrix CEL files in the past was a pain. Not so anymore. We have completely revamped Affymetrix CEL file import to be much more streamlined and versatile. You can now easily select all samples as the reference without building a reference spreadsheet. You can also choose to use pre-computed HapMap populations as references. Based on the type of CEL files you're importing SVS will also automatically identify the proper marker map and annotation files you need. If you don't have them, it will automatically download them for you. And for downstream analysis you have more flexibility in choosing the type of data you can import.
The Affymetrix Cytogenetic Whole Genome 2.7M Array is now fully supported, with enhanced CEL and CYCHP import, downloadable marker maps and library files, and access to pre-computed normalization data built on 485 samples so that you can normalize log ratios on a sample-by-sample basis.
Heat map of univariate segmentation results.
A number of new methods and enhancements are also available once you segment your log ratio data. You can discretize your segment covariates and segment list spreadsheets to categorize segment means into two or three state models. This helps magnify small, statistically significant differences between cases and controls and reduces the influence of outliers. You can also assess the overabundance of segments per sample. An unusually large number of segments is often indicative of data quality problems such as wave effects.
After the successful launch of the SVS Genome Browser in v7.3, we immediately began making it more powerful and flexible. Although we have worked hard to provide you with a bountiful set of public reference data through our on-demand network track feature, we realized that we needed to put the power into your hands to convert, create, and visualize any potential annotation information that can help you understand your data.
Annotation track manager with import from Wiggle file selected.
You can now easily customize the genome browser with annotation tracks that matter to you. Support for 2Bit, Wiggle, FASTA, and tabular files enables you to import your own custom annotations or tables from popular online databases such as UCSC, RefSeq, and dbGaP. You can also create any type of annotation from an SVS spreadsheet or download network annotation tracks from Golden Helix and store them locally for speed and efficiency.
You now have immediate access to several new annotation tracks from our network server including probe tracks for dbSNP builds 129, 130, and 131, SNPs catalogued from the 1000 Genomes project, miRNA, and Affymetrix MIP and Cytogenetics array annotations. For rare variant analysis a SIFT track is also available with predictions of how likely a mutation at a given loci is damaging.
Whether you're studying human genetics on newer or older builds, or one of many plant and animal species, you can now set the default genome so that you don't have to switch the build every time you open a plot. You can also set default annotation tracks to appear every time a genome browser is opened.
Now included with Python in SVS are the mature statistical and numeric methods packages of NumPy and SciPy, giving SVS a broad base of standardized test statistics and linear algebra. Now both you and our own bioinformaticians supporting you can quickly adapt methods and build custom analyses to solve any unique challenges you encounter. Combined with the powerful interactive features of SVS, Python scripts using these packages are first class features with polished interfaces, interactions and logging support. In fact, the Combined Multivariate and Collapsing method was entirely developed in Python!
Several enhancements and new additions will make SVS easier to use and learn. Download complete projects to help learn new analysis tricks and plotting techniques. Access pre-processed public data such as the 1K Genomes and HapMap to use as references. And easily download a full assortment of Affymetrix and Illumina marker maps. You'll also find a redesigned Regression Analysis window that makes it more intuitive as well as some handy dimension data at the top of every spreadsheet so you always know exactly how many sample and variables are represented without having to scroll.
New in 7.3
Fully Integrated and Interactive Genome Browser
Static genome browsers are a thing of the past.SVS 7.3 delivers fast, exploratory analysis of your data and genomic annotations simultaneously in a single, coherent view. Real-time network access to an expanding list of annotation tracks, such as RefSeq Genes, OMIM, GWAS Catalogue, and DGV ensures you spend more time doing science than operating software.
Faster, More Powerful Runs of Homozygosity Analysis
Comparing regions of the genome where long stretches of homozygous markers (Runs of Homozygosity) are present or absent, can help identify rare variants involved in recessive, pentrant disorders. SVS 7.3 delivers a faster ROH algorithm with more control over parameters, allowing the detection of longer, more biologically meaningful runs. Further, enhanced outputs for visualization and whole genome homozygosity association offer more ways to not only locate ROH regions that differ between groups but assess the significance of those differences as well.
Enhanced Data Support for Copy Number and Cytogenetic Research
SVS now offers a full suite of copy number and cytogenetic research tools for all major aCGH and SNP microarray platforms, including Affymetrix, Agilent, Nimblegen and Illumina. New in SVS 7.3 is streamlined import of Nimblegen Data Summary Files and Affymetrix's Cytogenetics Whole-Genome 2.7M and Molecular Inversion Probe (MIP) Arrays.
The need for high-performance analytics extends beyond humans when it comes to advanced genetic research. With improved data import and the ability to switch among an expanding list of genomes, SVS 7.3 makes accessible the full power of its analytic and visualization tools to a growing community of researchers studying non-human genetics.
Enhanced Plotting Controls
Creating captivating visualizations just got a whole lot easier. SVS 7.3 offers more control over how images are displayed, saved, and shared as well as providing the ability to add as many graphs to a single view from any data source in your project without having to first merge spreadsheets. Combined with annotation tracks from the new Genome Browser, the views you create are sure to make your colleagues jealous.
Accelerated and Enhanced PBAT Analysis
With SVS 7.3 we continue our dedication to working collaboritively with Dr. Christoph Lange of Harvard University School of Public Health to deliver the fastest, most powerful version of PBAT yet. Enhancements include accelerated performance, less restrictive parameters, and more options for family-based association testing.
New in 7.2
SVS now provides integrated tools for the design and analysis of family-based association studies through an exclusive version of the PBAT software package developed by Dr. Christoph Lange of Harvard University's School of Public Health. PBAT incorporates virtually all of the features of the FBAT package also released by Harvard but also provides many additional options for designing association/linkage studies and analyzing data with multiple continuous traits.
The latest version of PBAT incorporates a novel test that assesses the genotyping quality of individual probands in family-based association studies. Published in PLoS Genetics [Fardo, 2009] these tests are “ideally suited as the final layer of quality control filters in the cleaning process of genome-wide association studies." You can also assess Mendelian errors, Hardy-Weinberg Equilibrium and Call Rates per Marker.
A new plotting option enables you to generate heat maps – two-dimensional intensity plots of numeric values – from a spreadsheet. Heat maps are useful for identifying non-random patterns in your data. In addition to other applications, they can be helpful in identifying samples, or groups of samples, with copy number losses and gains. Heat maps can also be plotted alongside other numeric plots (e.g. p-values, CNV segmentation results) as well as LD plots.
Also included in the latest version is a global sample test to detect departures from Hardy-Weinberg Equilibrium within a single proband or case in a population based-association study. This test is especially valuable for genome-wide association studies.
Plots can now be more easily customized for publication, printing, and outputting to PDF with new print and image preview capabilities. Increase the scale and quality of an image, include Full Domain and Genome Track views, save to a variety of graphic formats and more.
SVS 7.2 Release Notes
New in 7.1
Interactively explore LD and haplotype analysis in an innovative and powerful new interface. You can view LD plots from one or more populations and explore them side-by-side with association results. For haplotype analysis it is easy to define and modify haplotype blocks from an LD plot or spreadsheet, compute haplotype and diplotype frequency tables, and perform a number of haplotype association tests, including per-block and per-haplotype methods.
Achieve better precision and accelerated speed for detecting copy number variation. CNAM Optimal Segmenting now incorporates a new parallelized, unbiased randomization permutation procedure that uses all available cores on your computer. The new permutation procedure replaces a naïve, potentially biased randomization procedure with the unbiased Fisher and Yates method (also known as the Knuth shuffle). An added option allows you to further refine your segments by efficiently removing univariate outliers during the segmentation process.
The time required for iterative use of Principal Component Analysis (PCA) has been significantly reduced by enabling the “recycling” of pre-computed principal components. This lets you run PCA once and then reuse the principal components in subsequent analyses instead of performing the time-consuming computation each time. Further, new data centering options, by marker and by sample, are now available for numeric data values (such as log ratios), improving the calculation of and correction for principal components.
Support for importing and exporting PED, TPED, and BED file formats makes it easy to move your data back and forth between SVS and other genetic analysis.
For a variety of applications, such as imputation and meta-analysis, it is important that two or more datasets represent alleles from the same strand for a given set of markers. Marker maps for Affymetrix and Illumina data (when exporting as Golden Helix DSF from BeadStudio) now include fields for top and bottom strand alleles. This enables you to transcode all genotypic markers from the AB to ACGT formats based on one or the other strands, ensuring consistency among two or more datasets.
Regression results are now more informative with several new regression outputs added to the results spreadsheet (when regressing once on each data column). This makes it easy to both sort and plot on a number of regression-based statistics. Selecting Allele Frequencies under Genotype Statistics now displays the minor and major alleles, in addition to their frequencies, for each genotypic marker.
SVS 7.1 Release Notes
New in 7.0
Anticipating association studies with hundreds of millions of data points generated per sample by next generation sequencing, the core architecture of SVS 7 has been completely reinvented to efficiently handle datasets of virtually any size on a desktop computer. Smart memory management and data caching ensures you will experience accelerated performance at every step.
Seeing is believing with an intuitive interface that puts your data in genomic context at every step. Discover how rewarding it is to navigate whole genome data live within a spreadsheet - complete with genomic annotations - or visually in a genome browser. For follow up analyses you can quickly look up significant markers in supported online databases. More consistent workflows make performing complex analyses quick, easy, and efficient.
Find more associations with the most extensive collection of genetic association tests, including allele, genotype, haplotype, copy number variation, runs of homozygosity, multi-locus, LD, and regression-based testing. Many tests can be run individually or simultaneously while also controlling for false positives by employing multiple testing corrections and permutation testing. Additional outputs of expected values enable you to generate Q-Q and P-P plots.
SVS 7 offers a complete workflow for copy number analysis and related CNV association studies. Process raw intensity data and simultaneously correct for batch effects, genomics waves and population stratification, while significantly improving signal-to-noise ratios. Employ optimal segmentation to detect copy number segment boundaries both on a per-sample (univariate) and multi-sample (multivariate) basis in the presence of mosaicism, even at a single probe level, and with controllable sensitivity and false discovery rates. Further, calculate CNV covariates for association testing and visualize copy number data in a genome browser.
A new dynamic analytic visualization tool with integrated genome browser offers exceptional flexibility in how you visualize data and present results. Gain greater insights with unprecedented whole genome views and navigation control. Apply data transformations or analytic functions in real-time. When you finalize the view you want, save your plots to a number of publication quality formats, including scalable vector graphics.
Having collaborated on over twenty SNP and CNV genome-wide association studies, we understand how critical high quality data is for achieving quality results. Therefore, considerable effort has been made to enhance quality assurance at every step. You can now easily generate a number of genotype statistics, view cluster plots of allele intensities, check gender and marker concordance, perform variance analysis on log ratios, filter poor quality markers and samples, and more
In addition to standard quality assurance measures, SVS 7 offers a powerful principal component analysis (PCA) approach for both SNP and CNV data to simultaneously correct for batch effects, genomic waves, and population stratification. New enhancements include streamlined plotting of principal components and the ability to correct data using pre-computed principal components from a subset of markers (e.g. ancestry informative markers).
The sheer size and complexity of whole genome data makes it extremely difficult to work with. SVS 7 eliminates the hassles with real-time spreadsheet manipulation, data editing, and enrichment. Easily combine multiple sample sets and data of different types, from different arrays, or even platforms. Quickly recode genotypes based on a specified genetic model, flip DNA strands, transcode from AB to AGCT formats, and more. Further, an integrated spreadsheet editor facilitates data editing and transformation on a grand scale.
An advanced regression module allows you to perform linear and logistic regression, stepwise regression (both backward elimination and forward selection), and permutation tests with numeric variables and recoded genotypes. Use a moving window along with numeric or categorical covariates, against a single dependent variable. Regressions may either be performed with all variables and covariates together (“full model”) or with some of the covariates grouped into a “reduced model” (yielding a full-vs-reduced model p-value).
Automate workflows, incorporate custom methods, or interoperate with other programs. These are just a few examples of how you can enhance the utility of SVS 7 with a fully programmatic Python scripting interface. New to SVS 7 is an integrated Python script editor that makes it easy to read and write scripts helping even novice users realize the power of scripting.