Any validated bioinformatics pipeline must be continuously monitored. Quality management in clinical testing labs ensures that any divergence from predefined quality metrics during the analysis of clinical samples is investigated. For example:
- There is an insufficient number of sequence reads that passed the predefined base quality score threshold
- The number of variants identified in a data set may deviate substantially from an expected value
An appropriate quality management system is designed to provide the framework to deep dive into any divergences from the designed analytical process. It should allow the appointed investigator to determine possible root causes, as well as define corrective actions going forward. Now, laboratories are supposed to keep a record of any and all deviations from expected results.
Updates and Versions
Updates are a fact of life when it comes to software products. Commercial software vendors have to issue updates to fix known bugs and to release new features and capabilities. Similarly, they also occur with open source packages. Reference databases such as ClinVar are on a monthly update cycle. As much as labs would like to freeze a pipeline, the reality is that the individual software components are subject to constant change. Here are a few thoughts on how to handle this issue:
- Implement a policy for monitoring updates, patch-releases and other upgrades to the bioinformatics pipeline. This policy should also outline when these updates will be implemented. For example, certain intervals such as quarterly updates, depending on the impact and urgency of the update, should be included. It goes with saying that once an update occurs, the entire pipeline has to be re-validated.
- It is advisable for the laboratory to maintain records that document how the monitoring and implementation of updates will actually occur.
- Create a lab specific versioning system that keeps track of the configuration of a pipeline. Documentation should record the version of the overall pipeline, as well as those of the individual software packages. Additionally, any scripts written by staff bioinformaticians should be versioned as well. The version number of the overall pipeline should be incorporated into any clinical report that leverages data generated by that pipeline.
Data Storage
A laboratory must be able to explain how it conducted its work. It needs to be able to show why and how the pipeline at a given point in time created results that were in line with best practices at that time. For that, it is necessary to store all data files generated by the bioinformatics pipeline. Of course, this can be a massive undertaking as the number of processed samples go in the thousands, tens of thousands or beyond in a given year.
At the very minimum, there needs to be a data retention policy in place that outlines which files are being retained. For example, it is possible to re-compute BAM files from the FASTQ or vice versa. So, it would be not necessary to archive both data sets for any given sample. Labs should also be aware of any local, state and national requirements for the storage of data.
Labs that operate on a higher standard are also putting systems into place that allow them to revisit clinical reports that have been generated in the past. We are operating in a very dynamic field. The scientific community constantly publishes new information about the association of a variant with a particular disease. New treatment options become available. Either could alter the diagnosis and treatment selection for a particular patient as time progresses. Warehousing sample data and versioning the pipeline including databases allow a lab to effectively revisit past analysis efficiently…
To continue reading, I invite you to download a complimentary copy of this eBook below: