Add-On Scripts Repository for SVS

Here you will find a collection of Python scripts submitted by Golden Helix developers and our customers. All scripts are provided for no additional cost. So feel free to download, use, and even enhance!

Share your scripts with the Golden Helix Community

If you have written any scripts and would like to share them with other SVS users, we encourage you to email a *.txt or *.py file to community@goldenhelix.com with any accompanying documentation or special instructions. Once we test your script and check its validity, we'll post it on this page for others to download.


Keep informed on new scripts by subscribing to the technical support bulletins feed »

What is Python?

Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java. Integrating Python into SVS provides full programmatic access to many of the software's features enabling the augmentation of existing tools, creating entirely new ones, automation of work flows, integration with other programs and more.

Python Learning Resources
Date Script Download
2016-10-19 Annotate and Filter Variants

This script will annotate and filter variants based on information found in Variant, Interval, and Gene sources. More info »

2016-10-19 Variant Classification

This script will perform Variant Classification on a marker mapped genotype spreadsheet. More info »

2016-08-03 Activate ATCG SNPs to flip strand or to exclude SNPs

These scripts can be used to identify SNPs that have ambivalent orientation by comparing a genotype dataset with a reference dataset, such as HapMap data. More info »

2016-04-22 Run Multiple Genotype Association Tests

This script runs genotypic association tests on multiple dependent phenotype columns. More info »

2016-03-04 Illumina Multiple Final Report File Import

This script is designed to automate the import and merging of multiple Illumina Final Report files. More info »

2016-03-04 Multilocus Risk Score

This script will output the risk scores for each sample for each score provided. This script uses the numpy package. More info »

2015-10-06 Correct P-Values for Multiple Tests

This script takes a column of p-values and outputs several multiple testing corrections including Bonferroni, FDR (Storey 2002), BH FDR (Benjamini-Hochburg 1995) and BY FDR (Benjamini-Yekutieli 2001). More info »

2015-08-13 Export LongGen File

This script will export genotypic spreadsheet data into Plink’s LongGen file format.More info »

2015-05-08 Import Sorted VCF Files

This function imports 1000 Genomes .vcf file data into multiple spreadsheets. Special handling is provided for genotype data. The user can choose to import one VCF file or several VCF files simultaneously. More info »

2015-04-22 Add Annotation Data to Marker Map From Spreadsheet

This function takes the marker map applied to the current spreadsheet and adds specified annotation data from overlapping interval(s) to each marker in the marker map. More info »

2015-03-20 Rename Genotypes

This script scans the genotypic columns to find all existing genotypes, and then prompts for replacements. The resulting spreadsheet has the same dimensions with the appropriate genotype substitutions. More info »

2015-01-21 Join or Merge Several Spreadsheets

This function allows the user to merge several spreadsheets in one dialog, saving the user from having to merge each spreadsheet individually. More info »

2014-12-17 ANOVA with Phenotype and SNPs

This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a numeric phenotype column and several genotype columns which provide the grouping structure in each test. More info »

2014-08-20 Select Random Subset by Category

This script prompts the user for a fraction and will then active that number of random samples from each unique category of a genotypic, categorical or binary column. More info »

2014-06-23 Extract Info from Regression Stats Viewer

This script scans the Regression Statistics Viewer output and prints out the p-value after correcting for any covariates. More info »

2014-06-12 Export Impute2 Genotype Probabilities

From a spreadsheet containing marker-mapped genotypic columns, this script saves the spreadsheet in a series of Imput2 chr*.gen files with the corresponding chr*.sample files to the specified directory. More info »

2014-05-13 Linear and Logistic Regression with Interactions

This script will output the results from either a Linear or Logistic Regression Analysis run with one dependent variable, multiple interacting, and non-interacting covariates on all numeric columns. This script uses the numpy, scipy, and statsmodels packages. More info »

2014-04-23 Subset by Chromosome

This script scans genetic marker mapped columns and creates a subset spreadsheet for each unique chromosome with active data in the spreadsheet. More info »

2014-02-03 Consecutive Numeric Regression Analysis

This script will output the results from consecutive numeric regression tests run on one or more dependents. More info »

2013-12-27 Import Minimac Output

This script will import Minimac info and dose files phased genotype dosages as output from running the Minimac software. More info »

2013-12-27 BEAGLE/BEAGLECALL Scripts Package

These scripts are for importing and exporting files from the BEAGLE and BEAGLECALL Genetic Analysis Software Packages. More info »

2013-12-27 Import Concatenated Genotype String File

This import script is designed to import genotypic data that is stored in a concatenated string format. The user will specify the variant name column, sample name column, and data column(s) as well as the genotype encoding. More info »

2013-12-27 Export MACH PED_DAT Files

This script exports MACH/Merlin PED and DAT formatted files. Run this script from a pedigree spreadsheet that can contain as many phenotypes as desired. The user will be provided with the option to create one file per chromosome if a marker map is applied to the pedigree spreadsheet. More info »

2013-11-20 Subset by Category

This script creates a row subset spreadsheet for each unique entry in a user selected categorical, binary or genotypic column with active data in the spreadsheet. More info »

2013-11-03 Calculate Alt Read Ratio between Two Spreadsheets

This tool calculates the ratio of alternate read given an alternate read and reference read spreadsheet. The resulting spreadsheet contains the per-cell ratio as (Alt Depth)/(Alt Depth + Ref Depth) and can be used for filtering purposes with Set Genotypes to No-Call. More info »

2013-10-25 Report Samples with Unique Genotypes

This tool scans a genotype spreadsheet and determines samples that have unique genotypes, or are not found in any other sample at that loci. A report is created with binary columns representing the unique genotypes per sample per variant. More info »

2013-08-26 Subset by Chromosome

This script scans genetic marker mapped columns and creates a subset spreadsheet for each unique chromosome with active data in the spreadsheet. More info »

2013-08-26 Inactivate Duplicate Row Values

This script scans a selected column in a spreadsheet and inactivates rows based on user prompts by either inactivating all copies of the duplicate values or keeping the first occurrence and inactivating all subsequent duplicates. Row values need to match exactly, including case, to be consider duplicates. More info »

2013-08-26 Inactivate Duplicate Row Labels

This script scans a spreadsheet's row labels and inactivates rows based on user prompts by either inactivating all copies of the duplicate row labels or keeping the first occurance and inactivating all subsequent duplicates. More info »

2013-08-13 Activate or Inactivate based on Genomic Position

This function activate or inactivate markers in the current spreadsheet based on existence in another spreadsheet's marker map or existence in a marker map file, or both. Matching is done based only on chromosome and position information from both souces and not on marker labels. More info »

2013-08-13 Nonparametric Association Tests (Binary Dependent)

This function makes use of the scipy package, specifically the scipy.stats.ranksums and scipy.stats.mannwhitneyu functions. With one binary dependent column, the user can perform nonparametric association tests on all numeric columns. More info »

2013-08-13 Nonparametric Correlation

This function makes use of the scipy package, specifically the scipy.stats.spearmanr and scipy.stats.kendalltau functions. With one numeric dependent column, the user can perform nonparametric correlation tests on all numeric columns. More info »

2013-08-01 Compute Odds Ratio CI

This script takes a logistic regression results spreadsheet and calculates 90, 95 or 99% confidence intervals for the Odds Ratio. More info »

2013-05-17 Quantile Transformation

This script categorizes a numeric column into N user-specified quantiles. The cutoff points are calculated over all non-missing values and column values are compared against these cutoffs with <= . More info »

2013-05-01 Import Unsorted VCF Files

This script will import 1000 Genomes .vcf file date into multiple spreadsheets and/or marker map fields. More info »

2013-04-18 Find de Novo Candidate Variants

This tool uses pedigree information to identify candidate functional polymorphism, defined as the offspring in a trio having a genotype classified as a Mendelian error. By default, only heterozygous errors are considered candidates. Optionally, homozygous non-reference errors can be considered and require a reference allele field to be present in the marker map. Another option allows the user to restrict computation to affected offspring. More info »

2013-02-26 Activate Variants by Genotype Count Threshold

This tool scans genotypic columns and activates columns based on a user-specified count or percentage threshold of user-specified genotypes. For example, you could use this tool to activate all genotypic columns that contain at least 20% homozygous alternate variants. More info »

2013-01-02 Select Rows from String of Values

This script activates rows that contain values contained in a comma separated string entered in the prompt dialog for integer, binary, categorical or genotypic columns. If no values match all rows are inactivated. Allows an option to only change the state of active rows. More info »

2012-12-31 Copy Values into User Notes

This script copies all unique values from integer, binary, categorical or genotypic columns into the User Notes for the selected spreadsheet in a comma separated list for pasting elsewhere. More info »

2012-12-27 Activate or Inactivate based on Marker Map Field

This function takes a map field from the current spreadsheet as input and activates based on existence in another spreadsheet's column or another spreadsheet's map, or both. More info »

2012-10-09 Activate Variants by Sample Genotypes

This tool examines variant data and inactivates genotypic columns that do follow the specified genotypic patterns for the selected samples. The spreadsheet must contain mapped genotypic columns and the marker map must contain a reference allele field. More info »

2012-08-08 Build Variant Spreadsheet

This tool builds a variant spreadsheet based on a probe track and region definition selected by the user. By default, the entire track is included in the output spreadsheet. More info »

2012-07-31 Subset Informative Genotypes by Category

This tool scans genotypic columns to find informative genotypes defined by having at least one non-missing, non-reference allele. Informative genotype column sets are found for each unique category in a user-defined categorical column. More info »

2012-07-31 Build Sample Collated Spreadsheet

This tool transposes and collates several spreadsheets together. The collated spreadsheet contains a row for each intersecting column in the original spreadsheets and several columns for each original row over all spreadsheets. More info »

2012-07-25 Create Vector from Matrix Spreadsheet

This script transforms a spreadsheet into a tall-skinny formatted vector. The user will choose which column type to transform and it will result in a child spreadsheet node. If the user selects genotypic columns, the allele delimiter can optionally be removed or replaced. More info »

2012-07-10 Filter by SIFT Synonymous Classification

This filter inactivates mapped markers that are either predicted as synonymous or are predicted as nonsynonymous, depending on the inactivation option selected. More info »

2012-07-10 Filter by PolyPhen2 Score

This filter inactivates mapped markers that are either predicted as tolerated or have low confidence (do not pass filters) or are predicted as damaging (pass filters), depending on the inactivation option selected. More info »

2012-06-22 Absolute Risk Reduction

This script calculates the reduced risk for each genotype given binary disease status and treatment status columns. The script requires a spreadsheet that contains at least two binary columns and several genotypic columns. More info »

2012-06-21 Convert Dosages to Genotypes

This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in Single- or Double-Dosage format and may have samples in the row labels or column headers. If the samples are in the column headers, the spreadsheet may contain map information and allele translation values. More info »

2012-06-20 Calculate Pseudo Lambda

This script calculates a pseudo-lambda value on a column containing p-values. The formula used to calculate the pseudo lambda value is as follows:More info »

2012-05-31 Move Columns to Location

This script moves all columns of a user-specified type to the user-specified location (Beginning or End). This will allow the user to group all columns of the same type together. More info »

2012-05-03 Chi-Squared Test with Continuity Correction

This script performs a Chi-squared test on a spreadsheet with a binary dependent and genotypic data. The output will contain results for the traditional test as well as results with the Yates continuity correction applied. The following genetic models are available; Basic Allelic, Dominant, or Recessive. More info »

2012-02-28 Import Merlin PED DAT

This import script imports PED/DAT files created in MERLIN. File delimiters may be comma, whitespace or tab and allele delimiters may be whitespace or /. The data may include several phenotype, covariate and genotype columns. More info »

2012-02-10 MAF Filtering on Recoded Spreadsheet

This script calculates minor allele frequency (MAF) on recoded data created by Recode Genotypes with X Chromosome Adjustment. More info »

2012-02-10 Recode Genotypes with X Chromosome Adjustment

This script recodes genotypes based on an additive model with major/minor allele classification. Markers within the selected chromosomes are adjusted for male samples. More info »

2012-02-07 Import Tall Skinny Format

This import script is designed to import genotypic data that is stored in a tall skinny format. The user will specify the variant name column, sample name column, and data column(s). More info »

2012-02-03 Filter Columns by Regular Expression

This script takes a regular expression and activates all columns which contain an expression match in the column header. More info »

2012-01-30 Create Column From Row Labels

This script allows the user to add the row labels as a column in the spreadsheet. More info »

2012-01-25 Average Markers by Gene

This script calculates an average value for each row over each region as defined by a gene annotation track or a string marker map field. This script requires a marker mapped spreadsheet with several quantitative columns. More info »

2011-11-10 Inactivate Duplicate Column Headers

This script scans a spreadsheet's column headers and inactivates all additional occurrences of a column header (only the first occurrence remains active). More info »

2011-11-10 Convert Binary and Integer Values to Genotypes

This script recodes binary and integer genotypes to the standard genotype format of A_A, A_B, and B_B. Prompts for value of A_A, A_B, and B_B. All other numbers are encoded as missing. Thus if there is multi-allelic data in the spreadsheet, all numbers other than those specified will be encoded as "?". More info »

2011-11-10 Create Pseudo Marker Mapped Spreadsheet

From a non-marker mapped spreadsheet this script creates a new marker mapped spreadsheet with a pseudo marker map containing chromosome 1, positions 1 - #Rows. More info »

2011-11-10 Create Spreadsheet for Segmentation

Based on a column from a spreadsheet, this script creates a new spreadsheet with a pseudo marker map and generic column headers making it suitable for running CNAM optimal segmenting. More info »

2011-11-10 Frequency Table

This script will calculate the frequency distribution of two columns in a spreadsheet. The script can be accessed through the scripts menu and will prompt the user to select two non-real columns. More info »

2011-11-10 Split Column on Specified Delimiter

This script prompts the user to select a column that needs to be split on a specified delimiter and for the delimiter to use. The delimiter can be more than one character. More info »

2011-11-10 Genetic Distance between Samples

This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »

2011-11-10 MIP CN Transformation

This script creates 5 transposed spreadsheets, one for each column imported from the MIP Array copy number text file: Copy A, Copy B, CopyNumber, AlleleRatio, and AllelicDifferenceMore info »

2011-11-10 Select Subset of Data by XY Coordinates

This script takes an upper and lower bound for two numeric columns and creates a subset spreadsheet for the two columns. More info »

2011-10-25 CMH Test over Several Strata

This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »

2011-10-04 Genotype Statistics Summary

This script takes a spreadsheet that contains a case/control dependent variable and SNPs and runs all of the genotype association tests as well as tests for a heterozygous advantage model (Dd vs DD, dd) and a homozygous comparison model (DD vs dd). Also calculates Chi Squared Scores, Correlation/Trend test scores and completes count tables. More info »

2011-07-28 Alternate Allele Frequency

This script calculates the percentage of alternate alleles over all samples for each variant. The resulting spreadsheet has columns containing the reference count, alternate allele, alternate allele frequency, reference allele count and alternate allele count. More info »

2011-06-23 Create Table for Significant Region

Creates a spreadsheet with significant regions from a spreadsheet of p-values. This script extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »

2011-04-21 Filter by Marker Map Field

This function takes a map field from the current spreadsheet as input, then activates or inactivates based on a given threshold or list, or both. More info »

2011-04-21 KBAC with Permutation Testing

The Kernel-Based Adaptive Cluster (KBAC) method by Liu and Leal [Liu and Leal 2010] first catalogs the variant data within each of a number of regions into multi-marker genotypes. Since the variants are rare, only a relatively few different multi-marker genotypes will be found in any given region. More info »

2011-04-21 Apply Additional Marker Map

This function will apply an additional marker map to the a currently mapped spreadsheet. The user can choose to apply the new map's data to only unmapped columns or to all columns, preferring either new marker map or old marker map information. More info »

2011-04-21 LD Pairwise Analysis Scripts

This script outputs results from LD analysis, both the EM and CHM methods and both R² and D' values. More info »

2011-04-18 Highlight Values in XY Scatter Plot

This script plots an XY scatter plot with additional graph items to highlight values of interest. An independent column, dependent column and sample list is needed. More info »

2011-03-14 Import PennCNV

This script imports PennCNV input signal intensity files, where each file contains data for a single sample. More info »

2011-03-11 Append Several Spreadsheets

This function allows the user to append several spreadsheets in one dialog, saving the user from having to append each spreadsheet individually. More info »

2011-03-03 ANOVA on Numeric Columns

This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a categorical dependent column that provides the grouping structure and several numeric columns. More info »

2011-03-03 Import Affymetrix CN Segment Files

This script will import Affymetrix CN Segment files containing copy number segment data as outputted from Affymetrix. More info »

2011-03-03 Filter by SIFT Synonymous Classification

This function scans a marker-mapped spreadsheet with several genotypic columns and investigates the corresponding SIFT marker map synonymous or non-synonymous classifications.This script requires the purchase of the Sequence Module to function. More info »

2011-01-26 Affymetrix B Allele Frequency Calculation

Using Affymetrix CEL files as its source, this script combines quantile normalized SNP A and B probe intensities for each marker into a theta value, then calculates B-Allele Frequencies for each marker. More info »

2011-01-26 Calculate Expected P-value

This script takes spreadsheet that contains a p-value column and calculates expected p-values for the specified column. It is also optional to export expected –log10 p-values as well. More info »

2011-01-26 Chi-Squared Contingency Table

This script computes the Pearson’s Chi-Squared Statistic for a contingency table with m groups and n observations (m rows and n columns). For 2x2 tables the p-value, –log10 p-value, Bonferroni p-value and –log10 Bonferroni p-value are also computed. More info »

2011-01-26 CNV PCA Search

Given a spreadsheet, prompt for a principal components spreadsheet, a lower and upper bound on the number of components and a step size. Runs association tests using each components setting, does a linear regression on the least significant 90% of the data and reports the slope of the line and a goodness of fit statistic. This script can be used in conjunction with the CNV PCA Search Tutorial. More info »

2011-01-26 Create Table for Significant Regions

This script creates a spreadsheet with significant regions from a spreadsheet of p-values (in the first column). It also extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »

2011-01-26 Log Ratio Tails

This script calculates percentile values for the upper and lower tails of log ratios using two user-specified thresholds. Missing values are skipped. A log ratio call rate is returned with the results.  This script may also be used to identify percentiles for real-value data other than log ratios. More info »

2011-01-26 Row Averages with Histogram

This script will create a column subset from a numeric spreadsheet, then take the row averages and create a histogram of those averages. The subset is specified with a column chooser. This function is useful for LogR spreadsheets to investigate for possible CNVs. More info »

2011-01-26 Sample Pair Mismatch

This script compares genotype calls from NSP and STY files and calculates the correlation between the nearest markers in the two sets. If there is a high correlation, the NSP and STY markers correspond to the same person, otherwise there is a mismatch. More info »

2011-01-26 SNP Cluster Plots

This script creates scatter plots based on A and B allele intensities that can be split on SNP genotypes to create tri-colored cluster plots. The script will work for up to 100 SNPs at a time. More info »

SVS Software is intended for Research Use Only. Not for use in diagnostic procedures.