HelixTree Manual

Version 6.4.0

Copyright 2000-2008

[Picture]

HelixTree is a premier tool for extracting useful information from your data. At its heart is a sophisticated data analysis engine enhanced to support your ability to analyze genetics data. Golden Helix offers three ways for you to learn the capabilities and features of HelixTree.

The first and most immediate approach is this manual. In it you will find directions for installing HelixTree, tutorials on its use, and a comprehensive reference. You will need to read the section on installing the product. Also, we strongly recommend that you read the tutorials.

We recognize the difficulties learning any new skill from a book or manual no matter how well it is written. So, as a second approach to learning about HelixTree, we strongly recommend that you visit our web site at http://www.goldenhelix.com and sign up for a web-based seminar. The seminar can be set up to accommodate your schedule and its content can be tailored to meet your specific needs. The time spent in the seminar has proven to be a very effective way for new users to jump start their productivity with HelixTree and seasoned users to quickly acclimate to new features.

Finally, Golden Helix also offers on site training on HelixTree. The on-site training is very beneficial for teams of people. Contact our sales staff to discuss your on site training options.

We are always looking for ways to improve HelixTree both in its basic capabilities as well as in its ability to inter-operate with other tools used in your work. HelixTree now has a feedback mechanism which is intended to encourage you to communicate feature requests, bugs(!), as well as your experiences with the product.

We look forward to hearing from you.

Christophe Lambert
President & CEO of Golden Helix

Acknowledgements

HelixTree would not exist without the generous contributions of many minds and hearts. We would particularly like to thank the following people: Alan Menius, Meg Ehm, Dmitri Zaykin, Mike Mosteller, Tony Segreti, Allen Roses and many other visionary GlaxoSmithKline scientists and managers worldwide. Dr. Douglas Hawkins of the University of Minnesota, Bret Musser of Merck, Albert Seymour of Pfizer, Dr. Peter Westfall of Texas Tech University, Dr. S. Stanley Young of CGStat LLC, Dr. Sally John of The University of Manchester, Dr. Chao-Qiang Lai of the Human Nutrition Research Center on Aging at Tufts University, Steve Dubnoff at Circle Systems, Inc., our colleagues at INTEC Web & Genome, and all the helpful folks at Affymetrix. Finally, we’d like to thank the NIH National Institute for General Medical Sciences for their generous funding support through the SBIR program.




Trademarks Used

HelixTree is a registered trademark of Golden Helix Inc. Affymetrix, GeneChip and the Affymetrix logo are registered trademarks used by Affymetrix, Inc. Microsoft, Microsoft SQL, Transact-JQL, Excel, Access and ODBC are registered trademarks of Microsoft, Inc. Stat/Transfer is a registered trademark of Circle Systems, Inc. Oracle, Oracle PL-SQL and SQL Server are registered trademarks of Oracle, Inc. IBM and DB2 are registered trademarks of IBM. SAS is a registered trademark of SAS, Inc. Sybase is a registered trademark of Sybase,Inc. Any other incidentally used names that are registered trademarks are trademarks of their respective owners.
Contents
I  Installing HelixTree and Acquiring Data
1 Installing and Initializing HelixTree
 1.1 Installation Overview
 1.2 Release Notes
2 Welcome to HelixTree
 2.1 Goals for this Chapter
 2.2 Recursive Partitioning Primer
 2.3 The HelixTree Basic Workflow
 2.4 Tutorial 1: Performing the Basic Workflow in GUI Mode
 2.5 Tutorial 2: Performing the Basic Workflow in Scripting Mode
 2.6 Tutorial 3: Detailed Guide to a Standard Case/Control Association Study
3 Navigating the Main Screen
 3.1 Main Screen Overview
 3.2 Project Viewer Window
 3.3 Navigator Nodes
 3.4 The File Menu
 3.5 The Tools Menu
 3.6 The CNAM Menu
 3.7 The PBAT Menu
 3.8 The Help Menu
4 Importing Your Data Into HelixTree
 4.1 General Considerations
 4.2 Mathematical Considerations
 4.3 Importing Data
 4.4 Importing Copy Number Data
 4.5 Importing Family-Based Data
5 Scripting and Other Integrated Statistical Tools
 5.1 Integrated Tools Overview
 5.2 The Python Shell Window
 5.3 Running Scripts
 5.4 Selecting a Script Server
 5.5 Example Scripts
 5.6 Scripting Reference
 5.7 S-PLUS Integration
 5.8 R Integration
 5.9 PBAT Integration
6 Using the Spreadsheet Viewer
 6.1 Spreadsheet Overview
 6.2 Manipulating, Filtering and Preparing Data Using the Spreadsheet
 6.3 The File Menu
 6.4 The Edit Menu
 6.5 The Analysis Menu
 6.6 The Genetics Menu
 6.7 The Help Menu
II  Recursive Partitioning
7 Interactive Tree Analysis
 7.1 Tree Analysis Overview
 7.2 Setting Options for Tree Analysis
 7.3 Working with Nodes
 7.4 Manually Splitting Nodes
 7.5 Defining Splits
 7.6 The File Menu
 7.7 The Tree Menu
 7.8 The Font Menu - Resizing and Formatting Tree View
8 Prediction Recipes
 8.1 Training and Validation Recipe
 8.2 Predicting An Unknown Response
9 Random Tree Generation
 9.1 Random Tree Overview
 9.2 Creating a Random Tree Model
 9.3 Multitree Model Browsing - Tree View
10 Multivariate Tree Analysis
 10.1 Multivariate Analysis Overview
 10.2 Using More Than One Dependent Variable
11 Histogram Node Analysis
 11.1 Histogram Overview
 11.2 Viewing Split Data Histograms
12 The Observation Distance Matrix
 12.1 Observation Distance Matrix Overview
 12.2 Viewing Observation Distance Matrix
 12.3 Printing and Saving the Observation Distance Matrix
13 The Correlation Interaction View
 13.1 Correlation Interaction Overview
 13.2 Viewing Correlation Interactions
14 Linkage Disequilibrium View
 14.1 Linkage Disequilibrium Overview
 14.2 Plotting Linkage Disequilibrium
 14.3 The File Menu
 14.4 LD Computation
15 Hardy Weinberg Equilibrium View
 15.1 Plotting Hardy Weinberg
16 P-Value and Spreadsheet Plots
 16.1 Plotting P-Values and Spreadsheet Columns
 16.2 P-Value Plot Types
 16.3 Plot Functions
 16.4 The File Menu
17 Haplotype Regression and the Allele Table
 17.1 Plotting Haplotype Regression
 17.2 Displaying the Allele Table
18 Genetic Association Tests
 18.1 Genetic Association Tests Overview
 18.2 Genetic Models and Other Genetic Tests
 18.3 Test Statistics
 18.4 Missing Values
 18.5 Multiple Testing Corrections
 18.6 Correction for Stratification
 18.7 Overall Marker Statistics
 18.8 Using the Association Test Window
 18.9 Using the Separate Principal Components Analysis Window
 18.10 Using the Separate General Marker Statistics Window
19 EM Haplotype Frequency Estimation
 19.1 Haplotype Frequency Estimation Overview
 19.2 Window Display and Navigation
 19.3 Example of Using the Patient List
 19.4 EM Table and CHM Table
 19.5 The Diplotype Table
20 Two-Loci Genetic Plot
 20.1 Two-Loci Genetic Plot Overview
21 Runs of Homozygosity
 21.1 Runs of Homozygosity Overview
 21.2 Using Runs of Homozygosity
 21.3 The ROH Algorithm
22 Text Viewer
 22.1 Text Viewer Overview
 22.2 Navigating the Text Viewer Menus
23 PBAT Family-Based Analysis (Optional Module)
 23.1 PBAT Family-Based Analysis Overview
 23.2 Using PBAT Capabilities through HelixTree
 23.3 PBAT Power Calculations
 23.4 PBAT Data Analysis
 23.5 PBAT Data Analysis for Copy Number Variation
 23.6 A Glossary of Terms Used in Family-Based Analysis
24 Regression Analysis (Optional Module)
 24.1 Regression Analysis Overview
 24.2 Performing Analysis
25 Copy Number Analysis (Optional Module)
 25.1 Copy Number Analysis Overview
 25.2 Preparing the Log2 Ratio Data
 25.3 Using the Copy Number Analysis Segmentation Tool
 25.4 Outputs of the Copy Number Analysis Segmentation Tool
 25.5 Import LogR DSF Values Directly
 25.6 Save LogR DSF Values as CNT Files
 25.7 Using the LogR Association Tests and PCA Window
 25.8 Visualizing Copy Number Analysis Results
 25.9 Copy Number Analysis Examples and Tutorials
 25.10 Golden Helix Copy Number Segmentation Algorithm
 25.11 Workflow for Reading Affymetrix CEL Files
III  The Science Behind HelixTree
26 Formulas and Theories
 26.1 Split-Prediction Methodology
 26.2 Normally Distributed Response Binomial Predictor
 26.3 Normally Distributed Response Continuous-Ordinal Predictor
 26.4 Normally Distributed Response Categorical Predictor
 26.5 Linear Regression From a Tree Node
 26.6 Haplotype Trend Regression (HTR) with Continuous Response
 26.7 Composite Haplotype Method (CHM)
 26.8 Haplotype Trend Regression (HTR) with Continuous Response and Covariates (Optional Module)
 26.9 Stepwise Regression (Optional Module)
 26.10 Categorical Covariates and Interaction Terms (Optional Module)
 26.11 Results from Linear Regression (Optional Module)
 26.12 Binomially Distributed Response Binary Predictor
 26.13 Binomially Distributed Response Continuous/Ordinal Predictor
 26.14 Binomially Distributed Response Categorical Predictor
 26.15 Logistic Regression From a Tree Node
 26.16 Haplotype Trend Regression (HTR) with Binomial Response
 26.17 Haplotype Trend Regression (HTR) with Binomial Response and Covariates (Optional Module)
 26.18 Results from Logistic Regression (Optional Module)
 26.19 Categorical Response
 26.20 The False Discovery Rate and the Simes Method
 26.21 Permutation Test Methodology
 26.22 Methods for the Genetic Association Tests
 26.23 Formulas for Principal Components Analysis
 26.24 Methods for Obtaining General Marker Statistics
A EULA
B Installing the Third-Party Condor○R Package
 B.1 Installing CondorR○ Overview
 B.2 Downloading and Using the Installation Wizard
 B.3 Troubleshooting Techniques and Common Issues
C Extracting Affymetrix Copy Number Data for use in HelixTree
 C.1 Extracting Affymetrix Copy Number Data Overview
 C.2 Creating CNT Files using the Affymetrix CNAT Batch Analysis Tool
 C.3 Creating CNCHP Files using Affymetrix Genotyping Console 2.0
 C.4 Affymetrix CNT File Format
D Exporting Data from BeadStudio
 D.1 Exporting Data From BeadStudio Overview
 D.2 Exporting Genotype Data using the BeadStudio Final Report
 D.3 Exporting Copy Number Data using the HelixTree DSF Plug-In
E Platform Notes
 E.1 Microsoft Windows
 E.2 Linux
 E.3 Mac OS X
F Bug Fix History
 F.1 Bugs Fixed by Version
G Bibliography
Bibliography