In case you missed our live event yesterday, I wanted to share a link to the webcast recording: New Enhancements: GWAS Workflows with SVS. There were several questions asked, so we’ve also shared the Q & A session below!
Question: Are these enhancements priced as a separate feature?
Answer: No, SVS is a constantly evolving platform, so everything you see in this webcast is based on the standard SVS package with the sole exception of the genotype imputation capabilities. That is generally run on a server, due to big data analytics to be able to compute haplotype across tens of thousands of samples, etc.. So, we do have a server package for that genotype imputation but, otherwise, your standard SVS license would include everything you saw here including the new enhancements
Question: How many samples are needed to use these methods?
Answer: 500 samples is a bit low, in fact arguably not even statistically sound. Generally, you need thousands, 2-3 thousand sometimes and you will hear 5,000 as the mark for being the in safe zone. Our methods do scale very well and we put a lot of work into our previous releases to make the GBLUP GRM computation scale to computing kinship matrices large enough that they would require more memory than is available on your standard workstation if you were to use standard approaches. So with SVS you can get into very large numbers of samples without worrying.
Question: If one retains hundreds of thousands of very rare SNPs in a GWAS (not filter out by MAF) from an exome array how does this affect the observed versus expected plots of the p-values?
Answer: Rare variants, whether you have them in your exome array or coming from an NGS study, do need to be considered separately. At the end of the day, it’s not going to necessarily inflate your QQ plots, but they can essentially not be taken into high consideration and thus not add a lot of value to your study. Test statistics don’t have enough data points in a rare minor allele to be useful. In this case, we would suggest using a collapsing method that aggregates multiple rare variants in a given gene to have larger counts. You can also do that manually, as we have the ability to compute the total count of rare variants per gene. You can do that separately for your rare variants, merge those in with your common variants and then run a GWAS on that combined set of variants. CMC method does something similar in an automated fashion, and that is also on SVS. There are a couple different strategies for incorporating rare variants into this type of analysis.
Question: Can I use dbSNP on goat data? You mentioned human genome, just wondering if you have the feature for goats within the program?
Answer: We have quite a few genomes available for both model organisms and other commercially researched animals used in agrigenomics, including many plant species as well. As an SVS user, you get to request that we help curate things for your specific needs that are not in our existing public catalog. We also have the conversion tools for you to curate data from your own consortium or institute, such as allele frequency tracks and things like that. We do curate dbSNP on pretty much every genome that it is available for and we have a curated reference genome curated, and I wouldn’t be surprised if a goat is one of them.
If you have any questions of your own after watching the webcast, please feel free to send them to [email protected]!