Add-On Scripts Repository for SNP & Variation Suite

Here you will find a collection of Python scripts submitted by Golden Helix developers and our customers. All scripts are provided for no additional cost. So feel free to download, use, and even enhance!

Share your scripts with the Golden Helix Community

If you have written any scripts and would like to share them with other SVS users, we encourage you to email a *.txt or *.py file to [email protected] with any accompanying documentation or special instructions. Once we test your script and check its validity, we'll post it on this page for others to download.

Keep informed on new scripts by subscribing to the technical support bulletins feed »

What is Python?

Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java. Integrating Python into SVS provides full programmatic access to many of the software's features enabling the augmentation of existing tools, creating entirely new ones, automation of work flows, integration with other programs and more.

Python Learning Resources

Date	Script	Download
2023-12-06	Calculate Approximate Lambda from P Values This script follows a more standard practice for obtaining a Genomic-Control lambda value from a set of p-values where chi-squared values are not available or not used to obtain those p-values. More info »
2023-12-05	Create Pseudo Marker Mapped Spreadsheet From a non-marker mapped spreadsheet this script creates a new marker mapped spreadsheet with a pseudo marker map containing chromosome 1, positions 1 - # Active Rows or Active Columns. More info »
2021-08-24	Import Tall Skinny Format This import script is designed to import genotypic data that is stored in a tall skinny format. The user will specify the variant name column, sample name column, and data column(s). More info »
2021-05-27	LD Score Regression Calculates heritability scores, genetic correlation, and genetic covariance using the ldscore module (installed separately). More info »
2021-03-04	MLM with Multiple Phenotypes This script runs the equivalent of Spreadsheet > Genotype > Mixed Linear Model Analysis on multiple dependent phenotype columns, one column at a time. All possible Mixed Linear Model Analysis tests can be run on each of your dependent columns, which may be binary, integer-valued or real-valued columns. More info »
2021-02-26	Row Averages with Histogram This script will create a column subset from a numeric spreadsheet, then take the row averages and create a histogram of those averages. The subset is specified with a column chooser. This function is useful for LogR spreadsheets to investigate for possible CNVs. More info »
2021-02-19	Consecutive Numeric Regression Analysis This script will output the results from consecutive numeric regression tests run on one or more dependents. More info »
2021-02-09	Convert Real Columns to Single Precision This script recodes the double-precision data in the “Real” columns in your spreadsheet into single-precision data. More info »
2021-02-11	Run Multiple Genotype Association Tests This script runs genotypic association tests on multiple dependent phenotype columns. More info »
2021-01-21	Row Averages for Active Numeric Columns This script will take the active numeric (binary, integer, or real) columns of the current spreadsheet and compute the row averages of these columns, afterward creating a histogram of these averages. More info »
2019-03-12	Convert Binary and Integer Values to Genotypes This script recodes binary and integer genotypes to the standard genotype format of A_A, A_B, and B_B. Prompts for value of A_A, A_B, and B_B. All other numbers are encoded as missing. Thus if there is multi-allelic data in the spreadsheet, all numbers other than those specified will be encoded as "?". More info »
2016-10-19	Annotate and Filter Variants This script will annotate and filter variants based on information found in Variant, Interval, and Gene sources. More info »
2016-10-19	Variant Classification This script will perform Variant Classification on a marker mapped genotype spreadsheet. More info »
2016-08-03	Activate ATCG SNPs to flip strand or to exclude SNPs These scripts can be used to identify SNPs that have ambivalent orientation by comparing a genotype dataset with a reference dataset, such as HapMap data. More info »
2016-03-04	Illumina Multiple Final Report File Import This script is designed to automate the import and merging of multiple Illumina Final Report files. More info »
2016-03-04	Multilocus Risk Score This script will output the risk scores for each sample for each score provided. This script uses the numpy package. More info »
2015-10-06	Correct P-Values for Multiple Tests This script takes a column of p-values and outputs several multiple testing corrections including Bonferroni, FDR (Storey 2002), BH FDR (Benjamini-Hochburg 1995) and BY FDR (Benjamini-Yekutieli 2001). More info »
2015-08-13	Export LongGen File This script will export genotypic spreadsheet data into Plink’s LongGen file format. More info »
2015-05-08	Import Sorted VCF Files This function imports 1000 Genomes .vcf file data into multiple spreadsheets. Special handling is provided for genotype data. The user can choose to import one VCF file or several VCF files simultaneously. More info »
2015-04-22	Add Annotation Data to Marker Map From Spreadsheet This function takes the marker map applied to the current spreadsheet and adds specified annotation data from overlapping interval(s) to each marker in the marker map. More info »
2015-03-20	Rename Genotypes This script scans the genotypic columns to find all existing genotypes, and then prompts for replacements. The resulting spreadsheet has the same dimensions with the appropriate genotype substitutions. More info »
2015-01-21	Join or Merge Several Spreadsheets This function allows the user to merge several spreadsheets in one dialog, saving the user from having to merge each spreadsheet individually. More info »
2014-12-17	ANOVA with Phenotype and SNPs This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a numeric phenotype column and several genotype columns which provide the grouping structure in each test. More info »
2014-08-20	Select Random Subset by Category This script prompts the user for a fraction and will then activate that fraction of random samples from each unique category of a genotypic, categorical or binary column. More info »
2014-06-23	Extract Info from Regression Stats Viewer This script scans the Regression Statistics Viewer output and prints out the p-value after correcting for any covariates. This script must be used on a regression that corrects for covariates, and is not meant to be used with moving windows. More info »
2014-06-12	Export Impute2 Genotype Probabilities From a spreadsheet containing marker-mapped genotypic columns, this script saves the spreadsheet in a series of Imput2 chr.gen files with the corresponding chr.sample files to the specified directory. More info »
2014-05-13	Linear and Logistic Regression with Interactions This script will output the results from either a Linear or Logistic Regression Analysis run with one dependent variable, multiple interacting, and non-interacting covariates on all numeric columns. This script uses the numpy, scipy, and statsmodels packages. More info »
2014-04-23	Subset by Chromosome This script scans genetic marker mapped columns and creates a subset spreadsheet for each unique chromosome with active data in the spreadsheet. More info »
2013-12-27	Import Minimac Output This script will import Minimac info and dose files with phased genotype dosages which are output from running the Minimac software. More info »
2013-12-27	BEAGLE/BEAGLECALL Scripts Package These scripts are for importing and exporting files from the BEAGLE and BEAGLECALL Genetic Analysis Software Packages. More info »
2013-12-27	Import Concatenated Genotype String File This import script is designed to import genotypic data that is stored in a concatenated string format. The user will specify the variant name column, sample name column, and data column(s) as well as the genotype encoding. More info »
2013-12-27	Export MACH PED_DAT Files This script exports MACH/Merlin PED and DAT formatted files. Run this script from a pedigree spreadsheet that can contain as many phenotypes as desired. The user will be provided with the option to create one file per chromosome if a marker map is applied to the pedigree spreadsheet. More info »
2013-11-20	Subset by Category This script creates a row subset spreadsheet for each unique entry in a user selected categorical, binary or genotypic column with active data in the spreadsheet. More info »
2013-11-03	Calculate Alt Read Ratio between Two Spreadsheets This tool calculates the ratio of alternate read given an alternate read and reference read spreadsheet. The resulting spreadsheet contains the per-cell ratio as (Alt Depth)/(Alt Depth + Ref Depth) and can be used for filtering purposes with Set Genotypes to No-Call. More info »
2013-10-25	Report Samples with Unique Genotypes This tool scans a genotype spreadsheet and determines samples that have unique genotypes, or are not found in any other sample at that loci. A report is created with binary columns representing the unique genotypes per sample per variant. More info »
2013-08-26	Inactivate Duplicate Row Values This script scans a selected column in a spreadsheet and inactivates rows based on user prompts by either inactivating all copies of the duplicate values or keeping the first occurrence and inactivating all subsequent duplicates. Row values need to match exactly, including case, to be considered duplicates. More info »
2013-08-26	Inactivate Duplicate Row Labels This script scans a spreadsheet's row labels and inactivates rows based on user prompts by either inactivating all copies of the duplicate row labels or keeping the first occurance and inactivating all subsequent duplicates. More info »
2013-08-13	Activate or Inactivate based on Genomic Position This function activates or inactivates markers in the current spreadsheet based on existence in another spreadsheet's marker map or existence in a marker map file, or both. Matching is done based only on chromosome and position information from both souces and not on marker labels. More info »
2013-08-13	Nonparametric Association Tests (Binary Dependent) This function makes use of the scipy package, specifically the scipy.stats.ranksums and scipy.stats.mannwhitneyu functions. With one binary dependent column, the user can perform nonparametric association tests on all numeric columns. More info »
2013-08-13	Nonparametric Correlation This function makes use of the scipy package, specifically the scipy.stats.spearmanr and scipy.stats.kendalltau functions. With one numeric dependent column, the user can perform nonparametric correlation tests on all numeric columns. More info »
2013-08-01	Compute Odds Ratio CI This script takes a logistic regression results spreadsheet and calculates 90, 95 or 99% confidence intervals for the Odds Ratio. More info »
2013-05-17	Quantile Transformation This script categorizes a numeric column into N user-specified quantiles. The cutoff points are calculated over all non-missing values and column values are compared against these cutoffs with <= . More info »
2013-05-01	Import Unsorted VCF Files This script will import 1000 Genomes .vcf file date into multiple spreadsheets and/or marker map fields. More info »
2013-04-18	Find de Novo Candidate Variants This tool uses pedigree information to identify candidate functional polymorphism, defined as the offspring in a trio having a genotype classified as a Mendelian error. By default, only heterozygous errors are considered candidates. Optionally, homozygous non-reference errors can be considered and require a reference allele field to be present in the marker map. Another option allows the user to restrict computation to affected offspring. More info »
2013-03-28	Convert Integer and Real Columns to Binary by Threshold This script converts all columns of the specified type (Integer or Real or both) to binary by threshold More info »
2013-02-26	Activate Variants by Genotype Count Threshold This tool scans genotypic columns and activates columns based on a user-specified count or percentage threshold of user-specified genotypes. For example, you could use this tool to activate all genotypic columns that contain at least 20% homozygous alternate variants. More info »
2013-01-02	Select Rows from String of Values This script activates rows that contain values contained in a comma separated string entered in the prompt dialog for integer, binary, categorical or genotypic columns. If no values match all rows are inactivated. Allows an option to only change the state of active rows. More info »
2012-12-31	Copy Values into User Notes This script copies all unique values from integer, binary, categorical or genotypic columns into the User Notes for the selected spreadsheet in a comma separated list for pasting elsewhere. More info »
2012-12-27	Activate or Inactivate based on Marker Map Field This function takes a map field from the current spreadsheet as input and activates based on existence in another spreadsheet's column or another spreadsheet's map, or both. More info »
2012-10-09	Activate Variants by Sample Genotypes This tool examines variant data and inactivates genotypic columns that do follow the specified genotypic patterns for the selected samples. The spreadsheet must contain mapped genotypic columns and the marker map must contain a reference allele field. More info »
2012-08-08	Build Variant Spreadsheet This tool builds a variant spreadsheet based on a probe track and region definition selected by the user. By default, the entire track is included in the output spreadsheet. More info »
2012-07-31	Build Sample Collated Spreadsheet This tool transposes and collates several spreadsheets together. The collated spreadsheet contains a row for each intersecting column in the original spreadsheets and several columns for each original row over all spreadsheets. More info »
2012-07-25	Create Vector from Matrix Spreadsheet This script transforms a spreadsheet into a tall-skinny formatted vector. The user will choose which column type to transform and it will result in a child spreadsheet node. If the user selects genotypic columns, the allele delimiter can optionally be removed or replaced. More info »
2012-06-22	Absolute Risk Reduction This script calculates the reduced risk for each genotype given binary disease status and treatment status columns. The script requires a spreadsheet that contains at least two binary columns and several genotypic columns. More info »
2012-06-21	Convert Dosages to Genotypes This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in Single- or Double-Dosage format and may have samples in the row labels or column headers. If the samples are in the column headers, the spreadsheet may contain map information and allele translation values. More info »
2012-05-31	Move Columns to Location This script moves all columns of a user-specified type to the user-specified location (Beginning or End). This will allow the user to group all columns of the same type together. More info »
2012-05-03	Chi-Squared Test with Continuity Correction This script performs a Chi-squared test on a spreadsheet with a binary dependent and genotypic data. The output will contain results for the traditional test as well as results with the Yates continuity correction applied. The following genetic models are available; Basic Allelic, Dominant, or Recessive. More info »
2012-02-28	Import Merlin PED DAT This import script imports PED/DAT files created in MERLIN. File delimiters may be comma, whitespace or tab and allele delimiters may be whitespace or /. The data may include several phenotype, covariate and genotype columns. More info »
2012-02-10	MAF Filtering on Recoded Spreadsheet This script calculates minor allele frequency (MAF) on recoded data created by Recode Genotypes with X Chromosome Adjustment. More info »
2012-02-10	Recode Genotypes with X Chromosome Adjustment This script recodes genotypes based on an additive model with major/minor allele classification. Markers within the selected chromosomes are adjusted for male samples. More info »
2012-01-30	Create Column From Row Labels This script allows the user to add the row labels as a column in the spreadsheet. More info »
2012-01-25	Average Markers by Gene This script calculates an average value for each row over each region as defined by a gene annotation track or a string marker map field. This script requires a marker mapped spreadsheet with several quantitative columns. More info »
2011-11-10	Inactivate Duplicate Column Headers This script scans a spreadsheet's column headers and inactivates all additional occurrences of a column header (only the first occurrence remains active). More info »
2011-11-10	Create Spreadsheet for Segmentation Based on a column from a spreadsheet, this script creates a new spreadsheet with a pseudo marker map and generic column headers making it suitable for running CNAM optimal segmenting. More info »
2011-11-10	Frequency Table This script will calculate the frequency distribution of two columns in a spreadsheet. The script can be accessed through the scripts menu and will prompt the user to select two non-real columns. More info »
2011-11-10	Split Column on Specified Delimiter This script prompts the user to select a column that needs to be split on a specified delimiter and for the delimiter to use. The delimiter can be more than one character. More info »
2011-11-10	Genetic Distance between Samples This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »
2011-11-10	MIP CN Transformation This script creates 5 transposed spreadsheets, one for each column imported from the MIP Array copy number text file: Copy A, Copy B, CopyNumber, AlleleRatio, and AllelicDifferenceMore info »
2011-11-10	Select Subset of Data by XY Coordinates This script takes an upper and lower bound for two numeric columns and creates a subset spreadsheet for the two columns. More info »
2011-10-25	CMH Test over Several Strata This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »
2011-10-04	Genotype Statistics Summary This script takes a spreadsheet that contains a case/control dependent variable and SNPs and runs all of the genotype association tests as well as tests for a heterozygous advantage model (Dd vs DD, dd) and a homozygous comparison model (DD vs dd). Also calculates Chi Squared Scores, Correlation/Trend test scores and completes count tables. More info »
2011-04-21	Filter by Marker Map Field This function takes a map field from the current spreadsheet as input, then activates or inactivates based on a given threshold or list, or both. More info »
2011-04-21	Apply Additional Marker Map This function will apply an additional marker map to the a currently mapped spreadsheet. The user can choose to apply the new map's data to only unmapped columns or to all columns, preferring either new marker map or old marker map information. More info »
2011-04-18	Highlight Values in XY Scatter Plot This script plots an XY scatter plot with additional graph items to highlight values of interest. An independent column, dependent column and sample list is needed. More info »
2011-03-14	Import PennCNV This script imports PennCNV input signal intensity files, where each file contains data for a single sample. More info »
2011-03-11	Append Several Spreadsheets This function allows the user to append several spreadsheets in one dialog, saving the user from having to append each spreadsheet individually. More info »
2011-03-03	ANOVA on Numeric Columns This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a categorical dependent column that provides the grouping structure and several numeric columns. More info »
2011-03-03	Import Affymetrix CN Segment Files This script will import Affymetrix CN Segment files containing copy number segment data as outputted from Affymetrix. More info »
2011-01-26	Affymetrix B Allele Frequency Calculation Using Affymetrix CEL files as its source, this script combines quantile normalized SNP A and B probe intensities for each marker into a theta value, then calculates B-Allele Frequencies for each marker. More info »
2011-01-26	Calculate Expected P-value This script takes spreadsheet that contains a p-value column and calculates expected p-values for the specified column. It is also optional to export expected –log10 p-values as well. More info »
2011-01-26	Chi-Squared Contingency Table This script computes the Pearson’s Chi-Squared Statistic for a contingency table with m groups and n observations (m rows and n columns). For 2x2 tables the p-value, –log10 p-value, Bonferroni p-value and –log10 Bonferroni p-value are also computed. More info »
2011-01-26	CNV PCA Search Given a spreadsheet, prompt for a principal components spreadsheet, a lower and upper bound on the number of components and a step size. Runs association tests using each components setting, does a linear regression on the least significant 90% of the data and reports the slope of the line and a goodness of fit statistic. This script can be used in conjunction with the CNV PCA Search Tutorial. More info »
2011-01-26	Create Table for Significant Regions This script creates a spreadsheet with significant regions from a spreadsheet of p-values (in the first column). It also extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »
2011-01-26	Log Ratio Tails This script calculates percentile values for the upper and lower tails of log ratios using two user-specified thresholds. Missing values are skipped. A log ratio call rate is returned with the results. This script may also be used to identify percentiles for real-value data other than log ratios. More info »
2011-01-26	Sample Pair Mismatch This script compares genotype calls from NSP and STY files and calculates the correlation between the nearest markers in the two sets. If there is a high correlation, the NSP and STY markers correspond to the same person, otherwise there is a mismatch. More info »
2011-01-26	SNP Cluster Plots This script creates scatter plots based on A and B allele intensities that can be split on SNP genotypes to create tri-colored cluster plots. The script will work for up to 100 SNPs at a time. More info »

SVS Software is intended for Research Use Only. Not for use in diagnostic procedures.

Add-On Scripts Repository for SVS

Share your scripts with the Golden Helix Community

Keep informed on new scripts by subscribing to the technical support bulletins feed »

What is Python?

Python Learning Resources