Content-type: text/html
Man page of COMPUTE
COMPUTE
Section: User Commands (1)
Updated: April 3, 2002
Index
Return to Main Contents
NAME
compute - a program to summarize a lot of molecular population genetic data at once.
SYNOPSIS
compute -i <infile>
[options]
DESCRIPTION
compute calculates a whole slew of numbers that molecular population genetics types might be interested in. Most of the output consists of various estimators of theta (= 4Nu) and summaries of the site frequency spectrum (Tajima's D, Fay & Wu's H, Fu and Li stats). It also outputs Hudson's C, a moment estimator of the population recombination rate. The output is a simple tab-delimited table with a header, which can be loaded into a spreadsheet. The use of compute is quite flexible, allowing you to decide what sites to exclude, with one exception--sites with any state not in the set {A,G,C,T,N,-} are automatically excluded from the analysis. You can exclude sites with more than two states or sites with missing data by selecting the appropriate command-line options.
OPTIONS
compute accepts the following options:
- -i <infile>
-
specify a file containing aligned sequences in FASTA format to analyze. Formally, this option takes a pattern used by glob() (see man 3 glob) to find out what the infiles are. This can be used to process a lot of files at once. For example, compute -i '*.fasta' will analyze all files with the .fasta extension in the current directory. Note that the pattern is in single quotes in this case, to prevent the shell from expanding the wild-card character.
- -h <infile>
-
specify an infile in the "Hudson2001" SNP table format. Note that this format makes the -O option described below redundant, since the presence on an outgroup is obvious from the data file
- -o <outfile>
-
specify and outfile to write output to. Otherwise, it prints to stdout
- -O <outgroup>
-
<outgroup> is an int specifying which sequence in the data file is the ougroup. (start counting from 1)
- -n
-
where appropriate, use the number of segregating sites in all calculations (rather than the default, which is to use the inferred number of mutations).
- -s
-
suppress output of header information
- -P
-
print out just a polymorphism table for the data, and then exit
- -b
-
only analyze bi-allelic sites (this means 2 and only 2 character states present, including the outgroup)
- -N
-
only analyze segregating sites that contain no missing data. (i.e., exclude all sites containing the character N.)
- -p
-
calculate the probabilities of haplotype number, haplotype diversity, and several summary statistic using coalescent simulation. The probabilities are estimated from 10,000 random genealogies assuming no recombination. The sample size in the simulations is the sample size of the ingroup in the data file. By default, the simulation uses the "fixed segregating sites" method (Hudson 1993 "The how and why of generating gene genealogies" in Takahata and Clark, eds. "Mechanisms of Molecular Evolution", published by Sinauer Assosciates), fixing the number of polymorphisms on each genealogy to the number of mutations inferred from the data. However, if the data are processed with the -n option, the simulations use the number of segregating sites rather than the number of mutations
- -t
-
when using the -p option, this flag tells compute to use Watterson's Theta instead of the number of mutations/segregating sites in the coalescent simulation
AUTHOR
Kevin Thornton <k-thornton@uchicago.edu>
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 19:10:38 GMT, October 05, 2004