A scalable, multi-project warehouse for NGS variant call sets, clinical reports and catalogs of variant assessments.

Features

As Precision Medicine is taking off, the number of samples in a testing lab and the associated data volume is increasing exponentially. In order to organize the data and build a knowledge base of cases that can be used for future analysis as well as ongoing research, labs need to leverage state of the art warehousing technology. Building on the algorithms and high-performance storage technology powering the VarSeq® software, VSWarehouse is a scalable, multi-project warehouse for NGS variant call sets, clinical reports and catalogs of variant assessments.

Organize samples into Projects

Rather than having a costly and mutable single large relational model, VSWarehouse builds on the highly-performant storage technology developed by VarSeq to allow your samples to be organized in as many fully-versioned projects as needed in a fraction of the space. As new samples get uploaded from VarSeq's integrated VSWarehouse uploader, a background job is queued and run to create a new version of the project.

Scalable Technology

VSWarehouse is built on the Postgres database technology stack with a completely customized and optimized storage and query-execution layer. Taking advantage of the matrix structure of genomic data, a very space-efficient columnar and compressed storage engine allows projects computed with VarSeq's mature NGS data wrangling and annotation algorithms to be stored at a fraction of the size of traditional databases while still allowing for the full power and utility of a mature SQL front-end.

Variant Assessment Catalogs

VarSeq strives to provide all the high and low-level details needed for a variant scientist or medical professional to classify or QC variants for a specific sample or presenting phenotype. Our Assessment Catalog feature allows for a flexible way to capture lab-specific flags or classifications of variants outside of the single-project context, so it can be used as an annotation source for future projects. VSWarehouse acts as the hosting server of these assessment catalogs, providing a web-interface in which to query and manage them.

Central VSReports Hosting

VSReports allows for customizable report templates to be completed on a sample-by-sample basis in VarSeq. These sample level decisions and the rendered report are saved at the project level (and exportable as HTML or PDF). VSWarehouse allows for the same user experience within VarSeq, however, the reports are hosted, saved and indexed on the VSWarehouse server. All reports are then able to be queried at the variant or sample level, with the rendered reports hosted on the server and are ready for download or integration with other internal systems.

Projects as Variant Frequency Annotations

Projects hosted on VSWarehouse can be used as annotation sources in VarSeq to be integrated into your custom variant annotation and interpretation workflow. This allows any new variant to be annotated and potentially filtered with the frequency of that variant in your warehouse projects.

Multiple Interfaces

Without losing a single piece of information in the VCFs, VSWarehouse creates a single annotated matrix of all unique variants for all uploaded samples that is accessible through multiple interfaces include web-based, annotation interface from VarSeq and more.

Use Cases

Managing Massive amounts of Genetic Data

As Next-Generation Sequencing is taking off in the clinic, it creates a significant data management issue for clinicians, scientists and IT professionals alike.

How can we retain massive amount of data coming out of clinical pipelines in a way that enables labs to systematically build a knowledge base capturing the insights clinician gain on a day to day basis analyzing the genetic information of their patients? What infrastructure is required to alert medical personal of new research that could potentially alter medical decisions? And how can we embed the work that is being done in the labs into the general hospital workflows? Data warehousing is a pivotal technology that can help in all of these areas.

Revelant Data Warehouse Concepts

In a nutshell, a data warehouse integrates the following concepts:

  • Take data from all relevant operation systems. In a clinical NGS testing setting we are likely talking about BAM files, VCF files, and clinical reports among others.
  • Overlay data from the outside, such as industry benchmark data. In our domain, clinicians will likely leverage databases such as COSMIC, ClinVar, dbSNP, ExAC 61,000 Exomes and many more
  • Store data in suitable format to allow for easy access and decision making: This requires the definition of a unified warehouse data model.
  • Allow the deployment of analytics: Creation of dashboards, that give the users insights about the content of the warehouse and the ability to query the data models, e.g. how many samples, what variants have we seen in this gene.
  • Provide mechanisms to connect with external systems.

Easily Reference Previous Work

Among other things, our customers deploy the our warehouse solution to reference past work in their ongoing clinical interpretation, they maintain they own assessment catalogues and determine allele frequencies for a specific population or disease category.

*Update Content*

Case Studies

We know our software will exceed your expectations. But don't just take it from us, see what our customers have benefitted from it.

Recommended Learning Materials

We have a variety of supplemental learning materials that are an excellent resource for anyone interested in the industry or our software solutions. Here are some of our recommended materials for you to check out related to VSWarehouse!

eBooks

Read our eBook on the data explosion in genetics and how warehousing will come into play!

Webcasts

Learn how to leverage our state of the art genetic data warehousing technology.

Getting Started with VSWarehouse

Watch Now

Other Resources

Explore a clinical workflow in the VarSeq or follow along with a tutorial!

VarSeq Viewer:
Download Here


VSWarehouse Tutorial:
Download Here

Evaluation

Request a free trial of VarSeq!

Please enter your first name
Please enter your last name
Please enter a name
Please enter a valid phone
Please enter a valid email address
Please select your country
Please select your state

Stay updated with exclusive eBooks, timely invitations to webcasts and events, andother communications from Golden Helix.

Technical Specifications

GENERAL PURPOSE HARDWARE REQUIREMENTS

4 GB of RAM

Multicore CPU

100GB of space available for annotations and projects

ADVANCED AND WHOLE GENOME WORKFLOW HARDWARE REQUIREMENTS

If you are working with whole exomes or genomes, especially if or hundreds to thousands of samples, we suggest a high-memory configuration and plenty of storage capacity:

16GB+ of RAM (32GB for Servers)

8+ CPU Cores

1TB of space available for annotations and projects

OPERATING SYSTEMS

The following operating systems are supported:

64-bit Windows 7 or later (32-bit also supported, but not recommended)

Linux Ubuntu 14.04 or later (64-bit only)

Linux RHEL 6 or later, or equivalently CentOS 6 or later (64-bit only)

Mac OS X 10.9 or later

SERVER CONFIGURATIONS

With a server license, you can install your Golden Helix software solution on a server with multi-user access and shared resources. You can launch any number of instances of the software on the same host, and are only limited by the natural CPU, Memory and Disk resources of the server.

For Windows, you would need to use ability for multi-user Remote Desktop only available on Windows Server. We support Windows Server 2008 or newer.

On Linux, clients can log in from any operating system using SSH and open the Golden Helix software using X11-tunneling to interact with the software. On windows, we suggest a solution like MobaXterm that provides a all-in-one SSH client and X11 server to enable easy logging in, file transfer and opening of remote GUI applications.

PROXY SETTINGS, FIREWALLS AND ANTIVIRUS

Golden Helix VarSeq and SVS can be configured to access the internet through a SOCKS5 or HTTP/HTTPS Tunneling Proxy. Go to Tools -> Proxy Settings… to configure.

The software only needs to make outgoing connections on standard HTTP/HTTPS ports and protocols. If a local firewall is installed that prevents these types of outgoing connections (this is very uncommon), firewall rules will need to be created to whitelist the software.

Note we have run into numerous issues where aggressive anti-virus programs prevent the product from performing normal operations such as opening files and logging in. You may need to whitelist Golden Helix executables or disable these tools to perform your analytics.