As Precision Medicine is taking off, the number of samples in a testing lab and the associated data volume is increasing exponentially. In order to organize the data and build a knowledge base of cases that can be used for future analysis as well as ongoing research, labs need to leverage state of the art warehousing technology. Building on the algorithms and high-performance storage technology powering the VarSeq® software, VSWarehouse is a scalable, multi-project warehouse for NGS variant call sets, clinical reports and catalogs of variant assessments.
Organize samples into Projects
Rather than having a costly and mutable single large relational model, VSWarehouse builds on the highly-performant storage technology developed by VarSeq to allow your samples to be organized in as many fully-versioned projects as needed in a fraction of the space. As new samples get uploaded from VarSeq's integrated VSWarehouse uploader, a background job is queued and run to create a new version of the project.
VSWarehouse is built on the Postgres database technology stack with a completely customized and optimized storage and query-execution layer. Taking advantage of the matrix structure of genomic data, a very space-efficient columnar and compressed storage engine allows projects computed with VarSeq's mature NGS data wrangling and annotation algorithms to be stored at a fraction of the size of traditional databases while still allowing for the full power and utility of a mature SQL front-end.
Variant Assessment Catalogs
VarSeq strives to provide all the high and low-level details needed for a variant scientist or medical professional to classify or QC variants for a specific sample or presenting phenotype. Our Assessment Catalog feature allows for a flexible way to capture lab-specific flags or classifications of variants outside of the single-project context, so it can be used as an annotation source for future projects. VSWarehouse acts as the hosting server of these assessment catalogs, providing a web-interface in which to query and manage them.
Central VSReports Hosting
VSReports allows for customizable report templates to be completed on a sample-by-sample basis in VarSeq. These sample level decisions and the rendered report are saved at the project level (and exportable as HTML or PDF). VSWarehouse allows for the same user experience within VarSeq, however, the reports are hosted, saved and indexed on the VSWarehouse server. All reports are then able to be queried at the variant or sample level, with the rendered reports hosted on the server and are ready for download or integration with other internal systems.
Projects as Variant Frequency Annotations
Projects hosted on VSWarehouse can be used as annotation sources in VarSeq to be integrated into your custom variant annotation and interpretation workflow. This allows any new variant to be annotated and potentially filtered with the frequency of that variant in your warehouse projects.
Without losing a single piece of information in the VCFs, VSWarehouse creates a single annotated matrix of all unique variants for all uploaded samples that is accessible through multiple interfaces include web-based, annotation interface from VarSeq and more.
As Next-Generation Sequencing is taking off in the clinic, it creates a significant data management issue for clinicians, scientists and IT professionals alike.
How can we retain massive amount of data coming out of clinical pipelines in a way that enables labs to systematically build a knowledge base capturing the insights clinician gain on a day to day basis analyzing the genetic information of their patients? What infrastructure is required to alert medical personal of new research that could potentially alter medical decisions? And how can we embed the work that is being done in the labs into the general hospital workflows? Data warehousing is a pivotal technology that can help in all of these areas.
In a nutshell, a data warehouse integrates the following concepts:
- Take data from all relevant operation systems. In a clinical NGS testing setting we are likely talking about BAM files, VCF files, and clinical reports among others.
- Overlay data from the outside, such as industry benchmark data. In our domain, clinicians will likely leverage databases such as COSMIC, ClinVar, dbSNP, ExAC 61,000 Exomes and many more
- Store data in suitable format to allow for easy access and decision making: This requires the definition of a unified warehouse data model.
- Allow the deployment of analytics: Creation of dashboards, that give the users insights about the content of the warehouse and the ability to query the data models, e.g. how many samples, what variants have we seen in this gene.
- Provide mechanisms to connect with external systems.
Among other things, our customers deploy the our warehouse solution to reference past work in their ongoing clinical interpretation, they maintain they own assessment catalogues and determine allele frequencies for a specific population or disease category.
We know our software will exceed your expectations. But don't just take it from us, see what our customers have benefitted from it.
Dr. Benjamin Darbro
Director, Shivanand R. Patil Cytogenetics and Molecular Laboratory
VarSeq had everything we were looking for with regards to annotation, filtering sources and an actual visual browser within the software itself. VSReports allows us to take it all the way to the report generation. But what really set it apart in retrospect is how intuitive it is.Read Dr. Darbro's Entire Case Study
Dr. Jeffrey Rosenfeld
Bioinformatics Scientist, Rutgers Cancer Institute of New Jersey
We could have taken a few tools and wrapped them together, but then we are responsible to maintain the system. By choosing a commercial solution, I can count on an entire team of programmers to keep the system updated. On top of that, the cost of an additional high-end programmer to help support an in-house system is more than the cost of a commercial solution.Read Dr. Rosenfeld's Entire Case Study
Recommended Learning Materials
We have a variety of supplemental learning materials that are an excellent resource for anyone interested in the industry or our software solutions. Here are some of our recommended materials for you to check out related to VSWarehouse!
Read our eBook on the data explosion in genetics and how warehousing will come into play!
Learn how to leverage our state of the art genetic data warehousing technology.
Getting Started with VSWarehouseWatch Now
Request a free trial of VSWarehouse!
4 GB of RAM
100GB of space available for annotations and projects
If you are working with whole exomes or genomes, especially if or hundreds to thousands of samples, we suggest a high-memory configuration and plenty of storage capacity:
16GB+ of RAM (32GB for Servers)
8+ CPU Cores
1TB of space available for annotations and projects
The following operating systems are supported:
64-bit Windows 7 or later (32-bit also supported, but not recommended)
Linux Ubuntu 14.04 or later (64-bit only)
Linux RHEL 6 or later, or equivalently CentOS 6 or later (64-bit only)
Mac OS X 10.9 or later
With a server license, you can install your Golden Helix software solution on a server with multi-user access and shared resources. You can launch any number of instances of the software on the same host, and are only limited by the natural CPU, Memory and Disk resources of the server.
For Windows, you would need to use ability for multi-user Remote Desktop only available on Windows Server. We support Windows Server 2008 or newer.
On Linux, clients can log in from any operating system using SSH and open the Golden Helix software using X11-tunneling to interact with the software. On windows, we suggest a solution like MobaXterm that provides a all-in-one SSH client and X11 server to enable easy logging in, file transfer and opening of remote GUI applications.
Golden Helix VarSeq and SVS can be configured to access the internet through a SOCKS5 or HTTP/HTTPS Tunneling Proxy. Go to Tools -> Proxy Settings… to configure.
The software only needs to make outgoing connections on standard HTTP/HTTPS ports and protocols. If a local firewall is installed that prevents these types of outgoing connections (this is very uncommon), firewall rules will need to be created to whitelist the software.
Note we have run into numerous issues where aggressive anti-virus programs prevent the product from performing normal operations such as opening files and logging in. You may need to whitelist Golden Helix executables or disable these tools to perform your analytics.