Eliminate Low-Quality Samples and Markers
In Part I of this GWAS Analysis series, Dr. Eli Sward provided us with a great overview on the value SVS provides in managing the quality of your SNP or NGS data to maintain the high power and accuracy of your GWAS. He also gave a snapshot of what a typical genotype spreadsheet may look like.
Today, I’m going to discuss some basic filtering steps to eliminate
Focusing on the sample quality statistics, call rate is one default output among many other stats the user can select (Figure 2). Any of these selected statistics will be generated in a new spreadsheet which the user can assess and use to filter out low-quality samples. One simple approach to navigate through the call rate of all samples is to right click on the column header and generate a histogram (Figure 3 & 4).
After viewing the histogram, the user can get a feel for how many samples have ideal call rate quality and develop a threshold for filtering out low call rate samples. This threshold can be set by going back to the sample statistics spreadsheet, right clicking on the call rate column, and selecting Activate By Threshold (Figure 5). After applying the threshold, and samples with low call rate are then inactivated from the spreadsheet (Figure 6.) The number of remaining active samples can be viewed in the top right corner of the spreadsheet where in Figure 6 you’ll see 468 samples remaining.
Now the user can apply this filtered sample set to the original genotypes spreadsheet. From the sample statistics spreadsheet, click Select -> Apply Current Selection to Second Spreadsheet (Figure 7). This will take the remaining activate samples with ideal call rate and apply the filtered sample criteria to the samples in the genotypes spreadsheet (Figure 8).
Shown in Figure 9 the user then can choose to subset their data to only activate columns or rows (rows in the case of samples for this example).
This process is an easy way to generate sample and marker statistics, assess the results across your population quickly with plots, and rapidly filter out low-quality markers and samples. This really is the beginning of quality control steps behind a GWAS analysis. The next part of this blog series covers steps to prune markers for linkage disequilibrium as well as investigate sample relatedness so to further improve the power behind your GWAS. If you have any specific questions, please reach us at [email protected]!