The week of October 10-16th was a busy time in our industry. Hundreds of biostatisticians, genetic epidemiologists, and statistical geneticists gathered in Cambridge, MA for the annual conference of the International Genetic Epidemiology Society (IGES) on October 10-12, followed by the biennial Genetic Analysis Workshop (GAW) on October 13-16. I had the opportunity to participate in both conferences, and I was joined at IGES by my Golden Helix colleagues Christophe Lambert and Deni Hogan.
International Genetic Epidemiology Society
IGES is the premiere annual meeting for the field of statistical genetics, and this year’s conference was attended by over 350 people. IGES generally focuses on analytical methods for solving the most important problems facing the field. It was clear from the content of the platform presentations and poster sessions that the field is currently fixated on finding the “missing heritability” that has eluded the current generation of GWAS. The majority of the presentations could be classified into two groups: testing for interaction effects (GxG and GxE) in GWAS data, and analysis of rare variants–especially from sequence data. Another emerging theme at IGES was the resurgence of family-based analysis as a powerful tool for identifying disease susceptibility genes from sequence data.
Next-generation sequencing and analysis of the resulting data was definitely the biggest conversation at the conference. An informal survey taken at the Golden Helix booth showed that whole-genome or whole-exome sequencing is already a major focus for over 30% of conference attendees, and almost 90% expect to be doing similar work soon. The interest in sequence analysis was clearly reflected by the standing-room-only crowds that attended the “Next Generation Sequencing in Genetic Epidemiological Studies” workshop before the conference officially opened on October 10th. The workshop featured several experts from the academic and commercial space presenting current research involving NGS analysis and discussing the best analytical methods and study designs for NGS. Our own Christophe Lambert gave a presentation about imputation with sequence data and gave some suggestions about how to use sequencing to identify the best SNPs for inclusion on disease-specific SNP arrays.
Genetic Analysis Workshop 17
GAW is typically held in conjunction with IGES on even-numbered years. The purpose of GAW is to give researchers the opportunity to analyze state-of-the-industry data in a controlled environment where it is possible to evaluate analytical methods and establish guidelines for best practices. This year’s meeting was the 17th GAW. Participants were able to analyze exome sequence data drawn from an early release of the 1000 Genomes Project Pilot 3, combined with a complex set of simulated phenotypes. Extended pedigrees were simulated by in-silico “marriages” of the sequenced individuals, and workshop participants could choose to analyze the unrelated subjects, the extended pedigrees, or both. Workshop attendees were required to perform an analysis of the data and submit a written report prior to the meeting. Almost 170 papers were submitted.
Analysis of large-scale sequence data remains uncharted territory for most people in the field of statistical genetics, and the rules for sequence analysis have not yet been written. The novelty of the data resulted in a variety of innovative analysis methods being presented, including multiple adaptations of standard GWAS techniques and a wide array of data mining and machine learning techniques. Several approaches were devised to test for the cumulative effect of related rare variants, most relying on some sort of localized collapsing and/or penalized regression algorithms. I don’t feel like any single method surfaced as the best approach, but it was clear that sequence annotations are very important. Sequence data contains extensive genetic variation, and reducing the search space to a set of variants with an increased prior probability of deleterious effects is vital to the success of an analysis. The most promising methods require knowledge of whether SNPs result in non-synonymous coding changes, if they are located near splice points or in functional sites of proteins, and perhaps information about pathway relationships with other genes. It is almost impossible to make sense of the data without good annotation information.
Beyond annotations, we all relearned some old lessons at GAW: rare susceptibility variants are hard to detect, variants with large effects are easier to detect, sample size matters, study design matters, and population structure must always be accounted for. Despite the relatively small dataset (~24,000 variants across the exons of ~3500 genes), false positive associations were rampant in almost all analysis methods. We have already experienced the difficulty in replication of GWAS findings, and we must all be very careful as we go forward with sequence analysis to avoid an even greater preponderance of irreproducible results.
IGES and GAW are very important conferences in the field. We can’t wait to see everyone at IGES next year!. …And that’s my two SNPs.