Genetic testing labs deal with personal data in categories with the highest level of security requirements: personal identity and medical records. Given the liability and risk associated with a breach of this secure information, it is not surprising that many labs and institutes that aggregate genomic data prefer, if not require, on-premise analysis and storage solutions.
Golden Helix is in a unique position to provide completely on-premise analysis solutions with a history of building analysis software from the ground-up on first principles and a focus on providing integrated, turn-key solutions. This allows for a licensing model based on training and supporting users, not tracking per-sample usage of cloud resources.
As the regulatory environment around the world strengthens the privacy rights of individuals and the outcry around data breaches raises the stakes for building a secure system, making any new genomic analysis workflow secure should be the concern of every testing laboratory.
Considerations of a Secure Genomic Analysis Pipeline
Ultimately, the security of the genomic data of a lab depends on where it is stored. Larger laboratories that provide a range of testing services have been slow to adopt cloud-hosted solutions, largely due to the reliance on the security of third parties and the inherent sharing of that data with the solution vendor. In the US, this data sharing also requires signing additional legal liability documents with the vender to satisfy HIPAA.
Remember: one data breach could jeopardize the solvency of a small clinical lab. In this light, the extra cost, IT overhead and restrictions involved with hosting your own analysis solution behind a restrictive firewall is understandable.
Along with the on-premise solution, clinical labs that follow CLIA must ensure that the same results are produced for a test, regardless of when it is run. This requires that software updates, annotation updates and any other changes to the analysis system only happen at fixed times throughout the year and only when the lab is prepared to validate a new version of their procedures for a test.
On-Premise, with Local Data and User Authentication
One of the reasons most software vendors of complex analysis solutions can only be deployed through cloud-based environments is the difficulty of wrangling the many large and constantly changing annotation sources necessary in any clinical genetic testing pipeline. This includes databases like gnomAD, with hundreds of millions of known variants, as well as sources like ClinVar, with its troves of pathogenic assertions. Placing these sources into databases and using off-the-shelf query tools may seem like the obvious approach, but it forces an architecture relying on database servers and removes the flexibility to change versions or use multiple versions of annotation sources.
In contrast, the Golden Helix approach of using file-based internally compressed and gnomically indexed annotation sources allows for on-premise, offline and version-locked annotations. Additionally, in certain cases an institution wants to support users having full access to the full repository of genomic annotations on our cloud-based data server but is in a restrictive environment where even simple query and download internet access is not possible. We have recently introduced the ability to set up a data mirror of our entire annotation archive to a private annotation server that seamlessly works with local copies of VarSeq and VSClinical.
Finally, to support high-security environments, we can now support various methods of allowing logging into the software without internet access. This covers various scenarios, from running our command-line workflow automation on internal compute clusters to enabling users to log in using existing institutional credentials through a locally deployed single-sign-on server.
In summary, in this webcast on the best practices for building secure, offline genomic analysis pipelines, we cover:
- Building a FASTQ to clinical reports pipeline behind a firewall
- On-premise analysis, warehouse and data servers independent of the internet
- Single sign-on based on local credential systems and without internet access
- Storage and network considerations for the analysis of patient-linked data
- Choose when to update and validate new pipelines, data sources and software versions
We hope you our review of the capabilities and best practices in building the most secure environment for hosting the analytics behind your precision medicine tests.