Created:
July 2, 2008

User Level:
Intermediate

Products:
HelixTree, CNAM

Step 4. Check Concordance Between Reported Gender and Imputed Gender

Now that you have imputed gender values, you can compare these to the reported gender for each sample. First you need to import your phenotype information and join it with your imputed values.

If your phenotype data is stored as a CSV or TXT file you can select >File >Import Data >Import ASCII File from the Project Navigator. Locate the file you want to import, specify whether it is comma, tab, or space delimited and enter the column number containing the sample names to be used as row labels. For other import options see Chapter 4 of the manual.

IMPORTANT: In order to properly join spreadsheets, make sure your sample names in both spreadsheets match exactly.

Phenotype Data
Figure 1.Phenotype data joined with
chromosome X mean threshold data.

From your phenotype spreadsheet select >File >Join Spreadsheets on Row Label. Highlight the X chromosome mean threshold spreadsheet and click OK. A new spreadsheet will be created with both phenotype data and the X chromosome mean threshold data (Figure 1). Again, ‘0’ represents male and ‘1’ female.

Rather than visually inspect every row, a Gender Concordance script is available for download from our Add-on Scripts Repository enabling you to simultaneously check concordance across all samples.

Download this script and save it in the following directory: ../HelixTree/scriptsHT/user/Spreadsheet/Scripts/

From the joined spreadsheet select >Scripts >Gender Concordance. You will be asked to specify the reported gender column number and the imputed gender column. In the example above, these are 6 and 8 respectively. Click OK.

A new spreadsheet will be created with an additional binary column. ‘1’ indicates the reported and imputed genders are in concordance and ‘0’ they are not in concordance.

Gender Concordance Results - X
Figure 2. Gender concordance results
based on Y chromosome LogRs.

In observing the concordance results of the HapMap samples used throughout this tutorial (Figure 2), there is one sample, NA10854, whose genetically imputed gender is not consistent with its reported gender. This sample’s X chromosome LogR mean value of -0.1006417 seems to indicate it only has one X chromosome.

Repeat this step for the Y chromosome. Again analyzing the HapMap samples used throughout this tutorial results in Figure 3.

Gender Concordance Results - Y
Figure 3. Gender concordance results
based on Y chromosome LogRs.

Notice when using Y chromosome LogRs, the reported and imputed genders are in concordance for sample NA10854. Based on both X and Y chromosome data, it seems this sample has only one X chromosome and no Y chromosome. However, such a strong conclusion cannot be drawn without further investigation as this could be the result of a genotyping error.