How to cite this software: For all of the packages except rhothetapost, cite the package name along with the libsequence paper:
Thornton, K. (2003) libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19(17): 2325-2327For rhothetapost, cite the package name and one of the following papers (which were where the method was first developed):
Haddrill, P., K.R. Thornton, B. Charlesworth, and P. Andolfatto. (2005) Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Research 15: 790-799Thornton, K.R. and P. Andolfatto (2006) Approximate Bayesian Inference reveals evidence for a recent, severe, bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172: 1607-1619
The software below requires that libsequence be installed first, following the instructions for that package. The installation instructions are the same as for libsequence:
- unzip the software
- run the configure script in the directory created in step 1
- make
- sudo make install
Software dealing with sequence analysis:
analysis - C++ software for evolutionary genetic analysis. This package also requires the GNU Scientific Library to be installed. Many linux distros provide GSL packages, and OS X users can install it using either the fink or darwinports projects, according to their preference. (I prefer darwinports, for what that's worth). Howver, to make life easier on yourself, I recommend that OS X users install the GSL directly from the source code available at the GSL homepage. The reason for this is that I have not modified the build system to be able to deal with lib directories other than /usr/local/lib and /usr/lib. The GSL is used to calculate chi-squared probabilities for the program MKtest. If you're not aware of it, the GSL is a C library for numeric computation, essentially a modern version of "Numerical Recipes in C".
There are manpages for several of the programs in the analysis package (These may be out-of-date. Up-to-date version will be installed with the packages themselves):
- compute a "mini-DNAsp" for the Unix command-line
- gestimator, Ka/Ks by Comeron's method
- kimura80, to calculate divergence using Kimura's (1980) method
- polydNdS, to analyze silent and replacement polymorphism
- MKtest, to perform McDonald and Kreitman tests
- rsq, to summarize linkage disequilibrium in data
- descPoly, a program to output a qualitative summary of features of sequence polymorphism data
- sharedPoly, a program to calculate number of shared polymorphisms between 2 partitions of an alignment
sequtils -software for sequence manipulation.
manpages are available online for the following programs in the sequtils package (These may be out-of-date. Up-to-date version will be installed with the packages themselves):
Software dealing with analysis of coalescent simulation:
msstats Reads in data from Hudson's coalescent simulation program ms and calculates several common summary statistcs. The output is a tab-delimited list of statistics, with a header line so that the file can be easily processed in R.
example usage: ms 50 10000 -t 20 | msstats
msff Applies a frequency filter to the output of Dick Hudson's coalescent simulation. Using the -m flag, it filters on the minor allele frequency. Use -d to filter on the derived allele frequency. The filtered data a printed to stdout. The frequency filter removes sites where the relevant frequency is less than or equal to the input value. Frequencies are input as decimals on the interval [0,1]. For example, to calculate LD-related statistic using my msld package, but filtering out sites where the minor allele frequency is less than or equal to 10% in the sample:
ms 10 10000 -t 20 -r 20 1000 | msff -m 0.10 | msld > out.
rhothetapost Estimate mutation and recombination rates from multilocus polymorphism data. Described in Haddrill et al. (2005) and Thornton and Andolfatto (2006). Documentation is here