Important update (Jan 16, 2014): I am transitioning from managing a local software repository to distribution via github. Please see our github page.

How to cite this software: For all of the packages except rhothetapost, cite the package name along with the libsequence paper:

Thornton, K. (2003) libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19(17): 2325-2327

For rhothetapost, cite the package name and one of the following papers (which were where the method was first developed):

Haddrill, P., K.R. Thornton, B. Charlesworth, and P. Andolfatto. (2005) Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Research 15: 790-799

Thornton, K.R. and P. Andolfatto (2006) Approximate Bayesian Inference reveals evidence for a recent, severe, bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172: 1607-1619

The software below requires that libsequence be installed first, following the instructions for that package. The installation instructions are the same as for libsequence:

  1. unzip the software
  2. run the configure script in the directory created in step 1
  3. make
  4. sudo make install

Software dealing with sequence analysis:

analysis - C++ software for evolutionary genetic analysis. This package also requires the GNU Scientific Library to be installed. Many linux distros provide GSL packages, and OS X users can install it using either the fink or darwinports projects, according to their preference. (I prefer darwinports, for what that's worth). Howver, to make life easier on yourself, I recommend that OS X users install the GSL directly from the source code available at the GSL homepage. The reason for this is that I have not modified the build system to be able to deal with lib directories other than /usr/local/lib and /usr/lib. The GSL is used to calculate chi-squared probabilities for the program MKtest. If you're not aware of it, the GSL is a C library for numeric computation, essentially a modern version of "Numerical Recipes in C".

There are manpages for several of the programs in the analysis package (These may be out-of-date. Up-to-date version will be installed with the packages themselves):

  1. compute a "mini-DNAsp" for the Unix command-line
  2. gestimator, Ka/Ks by Comeron's method
  3. kimura80, to calculate divergence using Kimura's (1980) method
  4. polydNdS, to analyze silent and replacement polymorphism
  5. MKtest, to perform McDonald and Kreitman tests
  6. rsq, to summarize linkage disequilibrium in data
  7. descPoly, a program to output a qualitative summary of features of sequence polymorphism data
  8. sharedPoly, a program to calculate number of shared polymorphisms between 2 partitions of an alignment

sequtils -software for sequence manipulation.

manpages are available online for the following programs in the sequtils package (These may be out-of-date. Up-to-date version will be installed with the packages themselves):

  1. clustalwtofasta
  2. revcom
  3. toLDhat
  4. trimallgaps

Software dealing with analysis of coalescent simulation:

msstats Reads in data from Hudson's coalescent simulation program ms and calculates several common summary statistcs. The output is a tab-delimited list of statistics, with a header line so that the file can be easily processed in R.

example usage: ms 50 10000 -t 20 | msstats

msff Applies a frequency filter to the output of Dick Hudson's coalescent simulation. Using the -m flag, it filters on the minor allele frequency. Use -d to filter on the derived allele frequency. The filtered data a printed to stdout. The frequency filter removes sites where the relevant frequency is less than or equal to the input value. Frequencies are input as decimals on the interval [0,1]. For example, to calculate LD-related statistic using my msld package, but filtering out sites where the minor allele frequency is less than or equal to 10% in the sample:

ms 10 10000 -t 20 -r 20 1000 | msff -m 0.10 | msld > out.

rhothetapost Estimate mutation and recombination rates from multilocus polymorphism data. Described in Haddrill et al. (2005) and Thornton and Andolfatto (2006). Documentation is here

omega Calculates Kim and Nielsen's (2004, Genetics 167:1513) "omega_max" statistic which was explored in Jensen et al. (2007, Genetics 176 2371-3279). Please read the source code for documentation. Both Kim and Nielsen and Jensen et al. should be cited if this code is used--the first for the statistic, the latter for the implementation.