Sequence::SimpleSNP Class Reference
[Classes Related to Polymorphism tables]

SNP table data format. More...

#include <Sequence/SimpleSNP.hpp>

Inheritance diagram for Sequence::SimpleSNP:
Sequence::PolyTable

List of all members.

Public Types

typedef std::string & reference
typedef const std::string & const_reference
typedef std::vector
< std::string >::size_type 
size_type
typedef std::vector
< std::string >::iterator 
data_iterator
typedef std::vector
< std::string >
::const_iterator 
const_data_iterator
typedef std::vector< double >
::iterator 
pos_iterator
typedef std::vector< double >
::const_iterator 
const_pos_iterator
typedef
Sequence::polySiteVector::const_iterator 
const_site_iterator

Public Member Functions

 SimpleSNP (const bool diploid=0, const bool isofemale=0)
bool outgroup (void) const
void set_outgroup (const bool &b)
std::string label (unsigned i) const
std::istream & read (std::istream &s) throw (Sequence::badFormat,std::exception)
std::ostream & print (std::ostream &o) const
data_iterator begin ()
const_data_iterator begin () const
data_iterator end ()
const_data_iterator end () const
pos_iterator pbegin ()
const_pos_iterator pbegin () const
pos_iterator pend ()
const_pos_iterator pend () const
const_site_iterator sbegin () const
const_site_iterator send () const
std::vector< double > GetPositions (void) const
std::vector< std::string > GetData (void) const
virtual void ApplyFreqFilter (unsigned mincount, bool haveOutgroup=false, unsigned outgroup=0)
virtual void RemoveMultiHits (bool skipOutgroup=false, unsigned outgroup=0)
virtual void RemoveMissing (bool skipOutgroup=false, unsigned outgroup=0)
virtual void RemoveAmbiguous (bool skipOutgroup=false, unsigned outgroup=0)
virtual void Binary (bool haveOutgroup=false, unsigned outgroup=0, bool strictInfSites=true)
virtual bool operator== (const PolyTable &rhs) const
virtual bool operator!= (const PolyTable &rhs) const
 operator Sequence::polySiteVector () const
const_reference operator[] (const size_type &i) const
reference operator[] (const size_type &i)
bool empty () const
bool assign (PolyTable::const_site_iterator beg, PolyTable::const_site_iterator end)
template<typename numeric_type , typename string_type >
bool assign (const numeric_type *_positions, const size_t &_num_positions, const string_type *_data, const size_t &_num_individuals)
size_type size (void) const
double position (const std::vector< double >::size_type &i) const
unsigned numsites (void) const

Detailed Description

SNP table data format.

This class is used to deal with simple "polymorphism tables" that are formatted in a particular way.

The major purpose of this class is to be able to read in data files in this format. An example of the format is as follows:

6 6
679 1004 1153 1155 1277 1295
N N N N N N
Sim2 A C C G A A
Sim3 G C T G A C
Sim4 A C C T G C
Sim5 A C C T G C
Sim7 A T C T A C
Sim8 A C C G A A

The two numbers on the first line are the sample size and number of segregating sites, respectively. The next line contains the positions of each site. The third line contains the outgroup state for each variable site--if this is unknown, use an 'N' to indicate the ambiguity. The rest of the lines contain a unique sequence name, and then the states of each segregating site.

Finally, if all of the outgroup states are missing (as is the case in the above example, then it is assumed that no outgroup was typed, and no outgroup sequence is stored in PolyTable::data. However, if at least one of the outgroup states is unambiguous, a std::string representing outgroup states is stored as the first element in PolyTable::data.

Definition at line 67 of file SimpleSNP.hpp.


Member Typedef Documentation

typedef std::vector<std::string>::const_iterator Sequence::PolyTable::const_data_iterator [inherited]

const iterator to the data

Definition at line 90 of file PolyTable.hpp.

typedef std::vector<double>::const_iterator Sequence::PolyTable::const_pos_iterator [inherited]

const iterator to the positions

Definition at line 98 of file PolyTable.hpp.

typedef Sequence::polySiteVector::const_iterator Sequence::PolyTable::const_site_iterator [inherited]

Const iterator to segregating sites. The value type of this iterator is const std::pair<double,std::string>, where the double is the position of the segregating site, and the string the list of states at the site. The first character in the string corresponds to the state of the first character in the PolyTable (i.e. (*this)[0]), etc.

Examples:
PolyTableIterators.cc.

Definition at line 107 of file PolyTable.hpp.

typedef std::vector<std::string>::iterator Sequence::PolyTable::data_iterator [inherited]

non-const iterator to the data

Definition at line 86 of file PolyTable.hpp.

typedef std::vector<double>::iterator Sequence::PolyTable::pos_iterator [inherited]

non-const iterator to the positions

Definition at line 94 of file PolyTable.hpp.


Constructor & Destructor Documentation

Sequence::SimpleSNP::SimpleSNP ( const bool  diploid = 0,
const bool  isofemale = 0 
) [inline]

The two bools that this constructor takes allow you to deal with two very different types of polymorphism data. If both bools are set to 0 (the default), the data are simply read in as they are. However, if diploid == 1, then if n sequences are read in, the data are assumed to be diploid (make sense...), and are converted into 2n strings. Further, if heterozygous bases are encoutered (R,W, etc.), the two possible states are arbitrarily assigned to each sequence.

If isofemale==1, it is assumed that the data represent real haplotype data (i.e. phase is known). The name for the bool comes from the fact that data gathered from Drosophila lines is often obtained from isofemale stocks, making them homozygous such that the phase of each SNP is known. If a heterozygous base is found, one of the two possible states will be assigned randomly (NOT IMPLEMENTED YET!)

Definition at line 74 of file SimpleSNP.hpp.


Member Function Documentation

void Sequence::PolyTable::ApplyFreqFilter ( unsigned  mincount,
bool  haveOutgroup = false,
unsigned  outgroup = 0 
) [virtual, inherited]

go through the data and remove all positions where there is a variant at count (# of occurences in the sample) < minfreq

Parameters:
mincount minimum count of a variant in the data. Variants that occur < mincount times are thrown out.
haveOutgroup true if an outgroup is present in the data, false otherwise
outgroup the index in the data array containing the outgroup (if present)

Definition at line 256 of file PolyTable.cc.

template<typename numeric_type , typename string_type >
bool Sequence::PolyTable::assign ( const numeric_type *  _positions,
const size_t &  _num_positions,
const string_type *  _data,
const size_t &  _num_individuals 
) [inline, inherited]

Assign SNP data to the polymorphism table from a vector/array.

Parameters:
_positions an array representing the positions of the SNPs
_num_positions the number of elements in _positions
_data an array containing the characters for each SNP in each individual
_num_individuals the number of elements in _data
Note:
If the length of the elements in _data does not equal _num_positions, the assignment will fail and you will be left with an empty polymorphism table. The following piece of code shows how to assign from a std::vector:
      Sequence::PolySites snpTable;
      std::vector<double> positions;
      std::vector<std::string> data;
      //fill positions and data...
      if ( snpTable.assign(&positions[0],positions.size(),&data[0],data.size()) == true )
      {
      //ok
      }
      else
      {
      //assignment failed for some reason...
      }

Definition at line 34 of file PolyTable.tcc.

bool Sequence::PolyTable::assign ( PolyTable::const_site_iterator  beg,
PolyTable::const_site_iterator  end 
) [inherited]

Assignment operation, allowing a range of polymorphic sites to be assigned to a polymorphism table. This exists mainly for two purposes. One is the ability to assign tables from "slices" of other tables. Second is to facilitate the writing of "sliding window" routines.

Returns:
true if the assignment was successful, false otherwise. The only case where false is returned is if the number of individuals at each site is not the constan from beg to end.

Definition at line 71 of file PolyTable.cc.

PolyTable::const_data_iterator Sequence::PolyTable::begin (  )  const [inherited]
Returns:
a const iterator pointing to the beginning of the std::vector<string> containing the data

Definition at line 173 of file PolyTable.cc.

PolyTable::data_iterator Sequence::PolyTable::begin (  )  [inherited]
Returns:
an iterator pointing to the beginning of the std::vector<string> containing the data
Examples:
PolyTableIterators.cc.

Definition at line 153 of file PolyTable.cc.

void Sequence::PolyTable::Binary ( bool  haveOutgroup = false,
unsigned  outgroup = 0,
bool  strictInfSites = true 
) [virtual, inherited]

Recode the polymorphism table in 0,1 (binary notation)

Parameters:
haveOutgroup use true if an outgroup is present, false otherwise
outgroup the index of the outgroup in the data vector used to construct the object
strictInfSites if true, throw out all sites with > 2 character states (including outgroup!)
Note:
if haveOutgroup== true, then 0 means an ancestral state and 1 a derived state in the resulting. /note If haveOutgroup == true, and there are sites with missing data in the outrgroup sequence, those sites are removed from the data, since its assumed you actually want to know ancestral/derived for every site

Reimplemented in Sequence::SimData.

Definition at line 440 of file PolyTable.cc.

bool Sequence::PolyTable::empty (  )  const [inherited]
Returns:
true if object contains no data, false otherwise
Examples:
slidingWindow.cc.

Definition at line 66 of file PolyTable.cc.

PolyTable::const_data_iterator Sequence::PolyTable::end (  )  const [inherited]
Returns:
a const iterator pointing to the end of the std::vector<string> containing the data

Definition at line 182 of file PolyTable.cc.

PolyTable::data_iterator Sequence::PolyTable::end (  )  [inherited]
Returns:
an iterator pointing to the end of the std::vector<string> containing the data
Examples:
PolyTableIterators.cc.

Definition at line 163 of file PolyTable.cc.

std::vector< std::string > Sequence::PolyTable::GetData ( void   )  const [inherited]

Returns PolyTable::data, a vector of std::strings containing polymorphic sites. Assuming the vector is returned to a vector<string> called data, accessing data[i][j] accesses the j-th site of the i-th sequence

Definition at line 527 of file PolyTable.cc.

std::vector< double > Sequence::PolyTable::GetPositions ( void   )  const [inherited]

Returns PolyTable::positions.

Definition at line 519 of file PolyTable.cc.

std::string Sequence::SimpleSNP::label ( unsigned  i  )  const
Returns:
the label the i-th individual in the data

Definition at line 287 of file SimpleSNP.cc.

unsigned Sequence::PolyTable::numsites ( void   )  const [inline, inherited]

Return how many positions are stored in PolyTable::positions

Examples:
bottleneck.cc.

Definition at line 233 of file PolyTable.hpp.

Sequence::PolyTable::operator Sequence::polySiteVector (  )  const [inherited]

allow (implicit) typecast of Sequence::PolyTable to Sequence::polySiteVector

Definition at line 140 of file PolyTable.cc.

reference Sequence::PolyTable::operator[] ( const size_type &  i  )  [inline, inherited]

Return the i-th element of PolyTable::data.

Note:
range-checking done by assert()

Definition at line 160 of file PolyTable.hpp.

const_reference Sequence::PolyTable::operator[] ( const size_type &  i  )  const [inline, inherited]

Return the i-th element of PolyTable::data.

Note:
range-checking done by assert()

Definition at line 150 of file PolyTable.hpp.

bool Sequence::SimpleSNP::outgroup ( void   )  const

returns true if there is outgroup information, false otherwise

Definition at line 300 of file SimpleSNP.cc.

PolyTable::const_pos_iterator Sequence::PolyTable::pbegin (  )  const [inherited]
Returns:
a const iterator pointing to the beginning of the list of positions

Definition at line 209 of file PolyTable.cc.

PolyTable::pos_iterator Sequence::PolyTable::pbegin (  )  [inherited]
Returns:
an iterator pointing to the beginning of the list of positions
Examples:
PolyTableIterators.cc.

Definition at line 191 of file PolyTable.cc.

PolyTable::const_pos_iterator Sequence::PolyTable::pend (  )  const [inherited]
Returns:
a const iterator pointing to the beginning of the list of positions

Definition at line 217 of file PolyTable.cc.

PolyTable::pos_iterator Sequence::PolyTable::pend (  )  [inherited]
Returns:
an iterator pointing to the end of the list of positions
Examples:
PolyTableIterators.cc.

Definition at line 200 of file PolyTable.cc.

double Sequence::PolyTable::position ( const std::vector< double >::size_type &  i  )  const [inline, inherited]

Return the i-th position from the PolyTable::positions.

Note:
range-checking done by assert()

Definition at line 223 of file PolyTable.hpp.

std::ostream & Sequence::SimpleSNP::print ( std::ostream &  h  )  const [virtual]

print is a pure virtual function. Calls to ostream & operator<<(ostream & s, PolyTable & c) act via this routine, which must be defined in all derived classes

Implements Sequence::PolyTable.

Definition at line 217 of file SimpleSNP.cc.

std::istream & Sequence::SimpleSNP::read ( std::istream &  s  )  throw (Sequence::badFormat,std::exception) [virtual]
Exceptions:
Sequence::badFormat 

Implements Sequence::PolyTable.

Definition at line 37 of file SimpleSNP.cc.

void Sequence::PolyTable::RemoveAmbiguous ( bool  skipOutgroup = false,
unsigned  outgroup = 0 
) [virtual, inherited]

go through the data and remove all the sites with states other than {A,G,C,T,N,-}

Parameters:
skipOutgroup default is false. If true, the character state of the outgroup is ignored.
outgroup the index of the outgroup in the data vector

Definition at line 402 of file PolyTable.cc.

void Sequence::PolyTable::RemoveMissing ( bool  skipOutgroup = false,
unsigned  outgroup = 0 
) [virtual, inherited]

go through the data and remove all the sites with missing data (the character N).

Parameters:
skipOutgroup default is false. If true, the character state of the outgroup is ignored.
outgroup the index of the outgroup in the data vector

Definition at line 360 of file PolyTable.cc.

void Sequence::PolyTable::RemoveMultiHits ( bool  skipOutgroup = false,
unsigned  outgroup = 0 
) [virtual, inherited]

go through the data and remove all the sites with more than 2 states segregating. By default, this routine also removes sites where there are 2 states segregating in the ingroup. and the outgroup (if present) has a 3rd state.

Parameters:
skipOutgroup default is false. If true, the character state of the outgroup is ignored.
outgroup the index of the outgroup in the data vector

Definition at line 321 of file PolyTable.cc.

PolyTable::const_site_iterator Sequence::PolyTable::sbegin (  )  const [inherited]
Returns:
an object of type Sequence::PolyTable::const_site_iterator These iterators allow access to the columns (segregating sites) of polymorphism tables
Examples:
PolyTableIterators.cc, slidingWindow.cc, and slidingWindow2.cc.

Definition at line 226 of file PolyTable.cc.

PolyTable::const_site_iterator Sequence::PolyTable::send (  )  const [inherited]
Returns:
an object of type Sequence::PolyTable::const_site_iterator These iterators allow access to the columns (segregating sites) of polymorphism tables
Examples:
PolyTableIterators.cc, slidingWindow.cc, and slidingWindow2.cc.

Definition at line 241 of file PolyTable.cc.

size_type Sequence::PolyTable::size ( void   )  const [inline, inherited]

Return how many std::strings are stored in PolyTable::data.

Definition at line 214 of file PolyTable.hpp.


The documentation for this class was generated from the following files:

Generated on Mon Jul 12 15:22:05 2010 for libsequence by  doxygen 1.6.1