This last week I had the pleasure of attending the fourth annual Clinical Genome Conference (TCGC) in Japantown, San Francisco and kicking off the conference by teaching a short course on Personal Genomics Variant Analysis and Interpretation.
Some highlights of the conference from my perspective:
- Talking about clinical genomics is no longer a wonder-fest of individual case studies, but a pragmatic discussion of standards, data sharing and using the right tools for the right job.
- Early detection, prevention, and understanding wellness versus disease states can leverage genomics but also involves longitudinal measurements and many human factors.
- Some cancer types, such as non-small cell carcinomas clearly benefit from integrative analytics of multiple assays (WGS, mate-pair seq for SVs/CNVs, PCR for expression), but the complexity and cost is high. In other words, after the relative simply clinical assays of onco-gene panels to suggest targeted molecular therapies, it gets hard fast!
Getting Personal
I always enjoy teaching in the short course format, and this year I was especially looking forward to sharing more on the nuts of bolts of applying public data resources and using tools like GenomeBrowse and VarSeq to explore personal genomes (VarSeq didn’t exist the last time I slogged though my exomes in short course form. What a difference!).
Rudy: His own exome; now with ExAC, has a hemizygote G/A that 2/60K individuals also have; ornithine transcarbamylase deficiency #TCGC15
— Dale Yuzuki (@DaleYuzuki) June 22, 2015
I invited KT Pickard to join me in the short course to share his own -omic odyssey of having his genome sequenced and his recent crowd-funded personal research project of investigating his daughter’s autism with whole genome trio analysis using Golden Helix’s VarSeq. He just published a post describing the journey up to this point, and I look forward to working with him as he pursues this further.
Pickard: VarSeq by Golden Helix (Gabe Rudy gave earlier talk) http://t.co/Ybe8tSBotM Divides out mutations by type (de novo, etc) #TCGC15
— Dale Yuzuki (@DaleYuzuki) June 22, 2015
Longevity and Wellness
A recurring theme of talks at this conference was leveraging recent discoveries and new techniques like 16S sequencing of the microbiome to understand more about healthy individuals and to potentially predict onset of chronic disease while the progression may still be deflectable.
Nathan Price from the Institute for Systems Biology laid out their extremely ambitious plan to follow up to 100K people over 20-30 years to monitor and have the data to potentially predict “transition states” from wellness to chronic diseases.
This type of long-term thinking is in short supply, and like the Framingham Heart Study, may well become a treasure trove of evidence-based medicine to fuel the kind of policy-making to fundamentally shift the balance of our healthcare system from disease management to disease prevention!
But to make the case for the project, both for funders and participants, they successfully finished a 100 person pilot over 3 months that had some extremely useful and interesting results.
.@ISBNathanPrice Did pilot project: 100 person wellness project. WGS, 3x blood/saliva/urine/microbiome, fitbit, coaching sessions #TCGC15
— Gabe Rudy (@gabeinformatics) June 22, 2015
I was also excited to see one of the authors of the paper Whole-Genome Sequencing of the World’s Oldest People talk more on the project. I have taken advantage of their generous sharing of the 17 whole genomes:
Thanks Kristen Fortney for making the supercentenarians public! I enjoyed analyzing them! Blog on my analysis http://t.co/X8MN7Y9Qoe #TCGC15
— Gabe Rudy (@gabeinformatics) June 23, 2015
Note, if you would like to use these genomes as reference/control samples in your own projects, you don’t need to spend months processing the data that we did, simply grab it from our public annotation repo available from all our products!
Standards, Consensus and Data Sharing
In previous years, some alarms were raised about whether bioinformatics tools were ready for the clinical big leagues, with dauntingly poor consensus shown between alignment and variant calling tools when comparing mass sets of variants.
With efforts like Genome in a Bottle (GIAB), Genome Comparison & Analytics Testing (GCAT) and the ongoing work of various working groups of Global Alliance for Genomic Health (GA4GH), we are at the point where like any clinical assay, a lot is understood about the sensitivity and specificity of calling NGS variants in various genomic regions and with different capture and sequencing techniques.
If there is a remaining frontier to be conquered for routine clinical use of NGS, it is in the moving target of annotating variants with public and private sources to inform their clinical utility and interpretation.
While ClinVar has stepped up to be the Nexus for labs to share their clinical assessments of variants for germline conditions, there has been a need for a similar de facto standard for cancer mutations. I am hoping the recently announced open source (and open data) CiVIC database will fill those shoes.
Prinicples of CIViC: https://t.co/DWJTkpEN13 – Interpretations/debate are open – Interdisciplinary approach – Human and API access #TCGC15
— Gabe Rudy (@gabeinformatics) June 23, 2015
Being built entirely of public submissions and the generous by-hand curation of literature done by the folks at Washington University St. Louis, it is currently behind the likes of MyCancerGenome and commercial vendors like MedGenome OncoMD (now available in VarSeq). However, if it can get some community momentum going, it has a real chance.
The same folks also have put out a Database of Curated Mutations (DoCM) that tracks published onco-driver mutations, as well as a Drug Gene Interaction database (DGIdb) for cataloging molecular therapies.
.@malachigriffith DoCM: * 751 variants, 74 genes, 69 cancer types * Requires pub evidence for relevance * “Gold or Platinum Vars” #TCGC15
— Gabe Rudy (@gabeinformatics) June 23, 2015
Finally, it was great to Steven Brenner talk about the CAGI contest.
SB: CAGI – https://t.co/exASTkCrZe runs a contest to predict pathogenicity (like SIFT/PolyPhen). Both generalized and specialized #TCGC15
— Gabe Rudy (@gabeinformatics) June 24, 2015
CAGI takes inspiration from contests in cheminformatics that have driven forward the field of accurately predicting 3D structure and folding of proteins by running contests for algorithm writers to predict soon-to-be published protein structure.
Applying this model to genomics, CAGI runs contests for similarly difficult problems like using the whole genome sequence of the Personal Genome Project to predict the phenotypes of the individuals, as well as predicting the damaging effects on protein production of individual variants (judging predictions against lab-generated data about the actual measured protein levels).
Their last set of challenges was in 2013, and with their bi-yearly schedule will be starting a new round this summer!