Sequence::PolySIM Class Reference
[Analysis of molecular population genetic data]

Analysis of coalescent simulation data. More...

#include <Sequence/PolySIM.hpp>

Inheritance diagram for Sequence::PolySIM:
Sequence::PolySNP

List of all members.

Public Member Functions

 PolySIM (const Sequence::SimData *data)
double ThetaPi (void)
double ThetaW (void)
double ThetaH (void)
double ThetaL (void)
unsigned NumMutations (void)
unsigned NumSingletons (void)
unsigned NumExternalMutations (void)
double TajimasD (void)
double Hprime (bool likeThorntonAndolfatto=false)
double Dnominator (void)
double FuLiD (void)
double FuLiF (void)
double FuLiDStar (void)
double FuLiFStar (void)
double WallsB (void)
unsigned WallsBprime (void)
double WallsQ (void)
int HudsonsHaplotypeTest (int subsize, int subss)
unsigned Minrec (void)
double VarPi (void)
double StochasticVarPi (void)
double SamplingVarPi (void)
double VarThetaW (void)
unsigned NumPoly (void)
double DandVH (void)
unsigned DandVK (void)
double HudsonsC (void)
std::vector< std::vector
< double > > 
Disequilibrium (const unsigned &mincount=1, const double &max_marker_distance=std::numeric_limits< double >::max())

Protected Member Functions

void WallStats (void)
void DepaulisVeuilleStatistics (void)
double a_sub_n (void)
double a_sub_n_plus1 (void)
double b_sub_n (void)
double b_sub_n_plus1 (void)
double c_sub_n (void)
double d_sub_n (void)

Protected Attributes

std::auto_ptr< _PolySNPImplrep

Detailed Description

Analysis of coalescent simulation data.

This class inherits from Sequence::PolySNP. It is a collection of analysis routines for coalescent simulation data, and is constructed from a const Sequence::SimData *. The main difference is that outgroup information is not required, as the 0,1 coding of a SimData object (usually) reflects ancestral and derived.

Examples:

bottleneck.cc, and msstats.cc.

Definition at line 44 of file PolySIM.hpp.


Constructor & Destructor Documentation

Sequence::PolySIM::PolySIM ( const Sequence::SimData data  )  [explicit]
Parameters:
data a valid object of type Sequence::SimData

Definition at line 41 of file PolySIM.cc.


Member Function Documentation

double Sequence::PolySNP::a_sub_n ( void   )  [protected, inherited]

\[a_n=\sum_{i=1}^{i=n-1}\frac{1}{i}.\ \]

This is the denominator of Watterson's Theta (see PolySNP::ThetaW)

Warning:
statistic undefined if there are untyped SNPs

Definition at line 1149 of file PolySNP.cc.

double Sequence::PolySNP::a_sub_n_plus1 ( void   )  [protected, inherited]

\[a_{n+1}=\sum_{i=1}^{i=n}\frac{1}{i}\ \]

Warning:
statistic undefined if there are untyped SNPs

Definition at line 1165 of file PolySNP.cc.

double Sequence::PolySNP::b_sub_n ( void   )  [protected, inherited]

\[b_n=\sum_{i=1}^{i=n-1}\frac{1}{i^2}\ \]

Warning:
statistic undefined if there are untyped SNPs

Definition at line 1182 of file PolySNP.cc.

double Sequence::PolySNP::b_sub_n_plus1 ( void   )  [protected, inherited]

\[b_n=\sum_{i=1}^{i=n}\frac{1}{i^2}\ \]

Warning:
statistic undefined if there are untyped SNPs
Author:
Joshua Shapiro

Definition at line 1197 of file PolySNP.cc.

double Sequence::PolySNP::c_sub_n ( void   )  [protected, inherited]

\[ c_n=\left\{\begin{array}{cl} 1 , & when\ n = 2 \\ \frac{2 \times (n \times a_n - 2 \times (n-1))}{(n-1) \times (n-2)}, & when\ n > 2 \\ \end{array}\right.\ \]

Warning:
statistic undefined if there are untyped SNPs

Definition at line 1213 of file PolySNP.cc.

double Sequence::PolySNP::d_sub_n ( void   )  [protected, inherited]

\[\ d_n=\frac{2}{n-1} \times (1.5 - \frac{2 \times a_{n+1}}{n-2} - \frac{1}{n})\ \]

Warning:
statistic undefined if there are untyped SNPs

Definition at line 1239 of file PolySNP.cc.

double Sequence::PolySNP::DandVH ( void   )  [inherited]

To check if two sequences are unique, Sequence::Comparisons::Different is used, which does not allow missing data to result in 2 sequences being considered different (as they would be if you simply used thestd::string comparison operators == or !=)

Returns:
the haplotype diversity of the data.
Examples:
bottleneck.cc.

Definition at line 1256 of file PolySNP.cc.

unsigned Sequence::PolySNP::DandVK ( void   )  [inherited]

To check if two sequences are unique, Sequence::Comparisons::Different is used, which does not allow missing data to result in 2 sequences being considered different (as they would be if you simply used the std::string comparison operators == or !=)

Returns:
number of haplotypes in the sample

Definition at line 1273 of file PolySNP.cc.

void Sequence::PolySNP::DepaulisVeuilleStatistics ( void   )  [protected, inherited]

Calculate the number of haplotypes in the sample, and haplotype diversity. Unlike Depaulis and Veuille's original paper, this routine uses an unbiased calculation of haplotype diversity (i.e. divide by n choose 2).
To check if two sequences are unique, Sequence::Comparisons::Different is used, which does not allow missing data to result in 2 sequences being considered different (as they would be if you simply used the std::string comparison operators == or !=)

Definition at line 753 of file PolySNP.cc.

std::vector< std::vector< double > > Sequence::PolySNP::Disequilibrium ( const unsigned &  mincount = 1,
const double &  max_marker_distance = std::numeric_limits<double>::max() 
) [inherited]
Returns:
A vector of statistics related to LD and distance in the sample. An empty vector is returned if there are < 2 polymorphic sites in the sample. See the documentation for Recombination::Disequilibrium for a description of the return vector.
Parameters:
mincount a frequency filter. A polymorphism must be present at least mincount times in the data
Note:
For D and D', the 11 gamete is defined as follows: If no outgroup is present, it refers to the genotype of minor alleles at both sites. If there is an outgroup, it is based on the genotype of derived alleles at both sites.

Definition at line 1420 of file PolySNP.cc.

double Sequence::PolySIM::Dnominator ( void   )  [virtual]
Warning:
statistic undefined if there are untyped SNPs
Returns:
Denominator of Tajima's D, or nan if there are no polymorphic sites

Reimplemented from Sequence::PolySNP.

Definition at line 181 of file PolySIM.cc.

double Sequence::PolySIM::FuLiD ( void   )  [virtual]
Returns:
The Fu and Li (1993) D statistic, or nan if there are no polymorphic sites.
Note:
For sequence data, an outgroup is required. This requirement is checked by assert()
Warning:
statistic undefined if there are untyped SNPs

Reimplemented from Sequence::PolySNP.

Examples:
msstats.cc.

Definition at line 309 of file PolySIM.cc.

double Sequence::PolySIM::FuLiDStar ( void   )  [virtual]
Warning:
statistic undefined if there are untyped SNPs
Returns:
Fu and Li (1993) D*, or nan if there are no polymorphic sites

Reimplemented from Sequence::PolySNP.

Examples:
msstats.cc.

Definition at line 357 of file PolySIM.cc.

double Sequence::PolySIM::FuLiF ( void   )  [virtual]
Returns:
Fu and Li (1993) F statistic, or nan if there are no polymorphic sites
Note:
For sequence data, an outgroup is required, else undefined
Warning:
statistic undefined if there are untyped SNPs

Reimplemented from Sequence::PolySNP.

Examples:
msstats.cc.

Definition at line 329 of file PolySIM.cc.

double Sequence::PolySIM::FuLiFStar ( void   )  [virtual]

Fu and Li (1993) F* statistic. Incorporates correction from Simonsen et al. (1995) Genetics 141: 413, eqn A5.

Warning:
statistic undefined if there are untyped SNPs
Returns:
Fu and Li (1993) F* statistic, or nan if there are no polymorphic sites

Reimplemented from Sequence::PolySNP.

Examples:
msstats.cc.

Definition at line 384 of file PolySIM.cc.

double Sequence::PolySIM::Hprime ( bool  likeThorntonAndolfatto = false  )  [virtual]

Redefinition of PolySNP::Hprime

Author:
Joshua Shapiro

Reimplemented from Sequence::PolySNP.

Definition at line 142 of file PolySIM.cc.

double Sequence::PolySNP::HudsonsC ( void   )  [inherited]
Returns:
Hudson's (1987) estimator of $\rho=4Nc$, an estimator of the population recombination rate that depends on the variance of the site frequencies. The calculation is made by a call to Recombination::HudsonsC
Note:
Will return nan if there are no polymorphic sites

Definition at line 1290 of file PolySNP.cc.

int Sequence::PolySIM::HudsonsHaplotypeTest ( int  subsize,
int  subss 
)

From Hudson et al (1994) on polymorphism at sod. For simulated data only. The function returns a 1 if the number of polymorphisms in a randomly generated subsample of the data is less than or equal to subss, 0 otherwise.

Parameters:
subsize the size of the subsample
subss the number of segregating sites in the subsample
Author:
Dick Hudson
Kevin Thornton

Definition at line 202 of file PolySIM.cc.

unsigned Sequence::PolySIM::Minrec ( void   )  [virtual]
Returns:
the minimum number of recombination events observed in the sample (Hudson and Kaplan 1985). Will return SEQMAXUNSIGNED if there are < 2 segregating sites.
Note:
code is a modification of that provided by Jeff Wall

Reimplemented from Sequence::PolySNP.

Definition at line 476 of file PolySIM.cc.

unsigned Sequence::PolySIM::NumExternalMutations ( void   )  [virtual]

similar to num singletons, but it assumes strict ancestral vs. derived in the data-> i.e. for the infinite-sites case, 0 is ancestral, 1 is derived (as in the case of coalescent simulations). note that Sequence does put 1 as the derived state, if you have an outgroup, so this is the routine to use.

Returns:
the number of derived alleles at frequency 1

Reimplemented from Sequence::PolySNP.

Definition at line 452 of file PolySIM.cc.

unsigned Sequence::PolySIM::NumMutations ( void   )  [virtual]
Returns:
number of mutations in the sample

Reimplemented from Sequence::PolySNP.

Definition at line 412 of file PolySIM.cc.

unsigned Sequence::PolySNP::NumPoly ( void   )  [inherited]
Returns:
the number of polymorphic (segregating) sites in data
Examples:
msstats.cc.

Definition at line 547 of file PolySNP.cc.

unsigned Sequence::PolySIM::NumSingletons ( void   )  [virtual]

A version optimized for simulated data where character states take on the values 0 or 1.

Returns:
number of sites where there is a mutation at frequency 1 in the sample

Reimplemented from Sequence::PolySNP.

Definition at line 423 of file PolySIM.cc.

double Sequence::PolySNP::SamplingVarPi ( void   )  [inherited]

Component of variance of mean pairwise differences from sampling. Tajima in Takahata/Clark book, (15)

Warning:
statistic undefined if there are untyped SNPs

Definition at line 976 of file PolySNP.cc.

double Sequence::PolySNP::StochasticVarPi ( void   )  [inherited]

Stochastic variance of mean pairwise differences. Tajima in Takahata/Clark book, (14).

Warning:
statistic undefined if there are untyped SNPs

Definition at line 962 of file PolySNP.cc.

double Sequence::PolySIM::TajimasD ( void   )  [virtual]

A common summary of the site frequency spectrum. Proportional to $\widehat\theta_\pi-\widehat\theta_W$. This routine does calculate the denominator of the test statistic.

Warning:
statistic undefined if there are untyped SNPs
Returns:
Tajima's D, or nan if there are no polymorphic sites

Reimplemented from Sequence::PolySNP.

Examples:
bottleneck.cc, and msstats.cc.

Definition at line 133 of file PolySIM.cc.

double Sequence::PolySIM::ThetaH ( void   )  [virtual]

For simulated data, where 0 is ancenstral, 1 derived.
A simpler version of PolySIM::ThetaH (const Sequence::PolyTable * data, bool haveOutgroup = 0, unsigned outgroup = 0)

Reimplemented from Sequence::PolySNP.

Examples:
msstats.cc.

Definition at line 88 of file PolySIM.cc.

double Sequence::PolySIM::ThetaL ( void   )  [virtual]

For simulated data, where 0 is ancenstral, 1 derived.
A simpler version of PolySIM::ThetaL()

Author:
Joshua Shapiro

Reimplemented from Sequence::PolySNP.

Definition at line 106 of file PolySIM.cc.

double Sequence::PolySIM::ThetaPi ( void   )  [virtual]

For simulated data, assuming 0 is ancenstral, 1 derived.
A simpler version of PolySNP::ThetaPi

Reimplemented from Sequence::PolySNP.

Examples:
bottleneck.cc, and msstats.cc.

Definition at line 55 of file PolySIM.cc.

double Sequence::PolySIM::ThetaW ( void   )  [virtual]

For coalescent simulation data, the number of segregating sites equals the number of mutations on the tree (under the infinite sites model.

Reimplemented from Sequence::PolySNP.

Examples:
msstats.cc.

Definition at line 76 of file PolySIM.cc.

double Sequence::PolySNP::VarPi ( void   )  [inherited]

Total variance of mean pairwise differences. Tajima in Takahata/Clark book, (13).

Warning:
statistic undefined if there are untyped SNPs

Definition at line 948 of file PolySNP.cc.

double Sequence::PolySNP::VarThetaW ( void   )  [inherited]
Returns:
Variance of Watterson's Theta (ThetaW()).
Warning:
statistic undefined if there are untyped SNPs

Definition at line 993 of file PolySNP.cc.

double Sequence::PolySIM::WallsB ( void   )  [virtual]
Returns:
Wall's B Statistic. Wall, J. (1999) Genetical Research 74, pp 65-79
Author:
Kevin Thornton

Reimplemented from Sequence::PolySNP.

Definition at line 633 of file PolySIM.cc.

unsigned Sequence::PolySIM::WallsBprime ( void   )  [virtual]
Returns:
Wall's B' Statistic. Wall, J. (1999) Genetical Research 74, pp 65-79
Author:
Kevin Thornton

Reimplemented from Sequence::PolySNP.

Definition at line 639 of file PolySIM.cc.

double Sequence::PolySIM::WallsQ ( void   )  [virtual]
Returns:
Wall's Q Statistic. Wall, J. (1999) Genetical Research 74, pp 65-79
Author:
Kevin Thornton

Reimplemented from Sequence::PolySNP.

Definition at line 645 of file PolySIM.cc.


The documentation for this class was generated from the following files:

Generated on Mon Jul 12 15:22:04 2010 for libsequence by  doxygen 1.6.1