C_SuffixArrayApplicationBase Class Reference

#include <_SuffixArrayApplicationBase.h>

Inheritance diagram for C_SuffixArrayApplicationBase:

C_SuffixArrayScanningBase C_SuffixArraySearchApplicationBase C_SuffixArrayLanguageModel

Public Member Functions

void loadData (const char *fileNameStem, bool noVoc, bool noOffset, bool noLevel1Bucket)
TextLenType returnCorpusSize ()
 C_SuffixArrayApplicationBase ()
virtual ~C_SuffixArrayApplicationBase ()

Protected Member Functions

void loadVoc (const char *filename)
void loadOffset (const char *filename)
void loadSuffix (const char *filename)
void loadCorpusAndInitMem (const char *filename)

Protected Attributes

TextLenType corpusSize
bool noVocabulary
bool noOffset
bool noLevel1Bucket
C_IDVocabularyvoc
IndexType sentIdStart
IndexType vocIdForSentStart
IndexType vocIdForSentEnd
IndexType vocIdForCorpusEnd
IndexTypecorpus_list
unsigned char * offset_list
TextLenTypesuffix_list
S_level1BucketElementlevel1Buckets

Detailed Description

Base class of Suffix Array applications Providing functions to load the suffix array and initialize the required vocIDs Revision
Rev
3665
Last Modified
LastChangedDate
2007-06-16 15:40:59 -0400 (星期六, 16 六月 2007)

Definition at line 23 of file _SuffixArrayApplicationBase.h.


Constructor & Destructor Documentation

C_SuffixArrayApplicationBase::C_SuffixArrayApplicationBase (  ) 

Revision

Rev
3815
Last Modified
LastChangedDate
2007-07-06 14:31:12 -0400 (星期五, 06 七月 2007)

Definition at line 18 of file _SuffixArrayApplicationBase.cpp.

References level1Buckets, noLevel1Bucket, noOffset, and noVocabulary.

C_SuffixArrayApplicationBase::~C_SuffixArrayApplicationBase (  )  [virtual]

Definition at line 26 of file _SuffixArrayApplicationBase.cpp.

References voc.


Member Function Documentation

void C_SuffixArrayApplicationBase::loadData ( const char *  fileNameStem,
bool  noVoc,
bool  noOffset,
bool  noLevel1Bucket 
)

Load the indexed corpus, suffix array, vocabulary, offset into memory for follow up applications It is optional to load vocabulary, offset depends on the argument. In the case when the testing data shares the same vocabulary as the training data and only vocIDs are used to represent the sentence/n-grams then vocabulary which maps between vocId and the word text can be skipped to save some memory.

If the suffix array object does not need to locate the sentence id of an occurred n-gram, then offset information is not needed.

Be very careful here, the suffix array class does not check if offset has been loaded in the search function to make it efficient you need to know what the suffix array class will be used (whether offset is needed) and load it properly

Parameters:
fileNameStem The filename of the corpus. This should be the same filename used in IndexSA
noVoc If set to be 'true', vocabulary will not be loaded
noOffset If set to be 'true', the offset information will not be loaded. <sentId, offsetInSent> information for an n-gram's occurrences can not be calculated.
noLevel1Bucket Level1Bucket is used to speed up the search at the cost of additional memory. For applications which do not need to locate n-grams in the corpus (such as the corpus scanning application), then there is no need to create Level1Bucket

Definition at line 60 of file _SuffixArrayApplicationBase.cpp.

References loadCorpusAndInitMem(), loadOffset(), loadSuffix(), loadVoc(), and noVocabulary.

Referenced by C_SuffixArrayScanningBase::C_SuffixArrayScanningBase(), and C_SuffixArraySearchApplicationBase::loadData_forSearch().

TextLenType C_SuffixArrayApplicationBase::returnCorpusSize (  ) 

Definition at line 310 of file _SuffixArrayApplicationBase.cpp.

References corpusSize.

void C_SuffixArrayApplicationBase::loadVoc ( const char *  filename  )  [protected]

Definition at line 105 of file _SuffixArrayApplicationBase.cpp.

References voc.

Referenced by loadData().

void C_SuffixArrayApplicationBase::loadOffset ( const char *  filename  )  [protected]

Definition at line 264 of file _SuffixArrayApplicationBase.cpp.

References SIZE_ONE_READ.

Referenced by loadData().

void C_SuffixArrayApplicationBase::loadSuffix ( const char *  filename  )  [protected]

Definition at line 186 of file _SuffixArrayApplicationBase.cpp.

References corpusSize, sentIdStart, and SIZE_ONE_READ.

Referenced by loadData().

void C_SuffixArrayApplicationBase::loadCorpusAndInitMem ( const char *  filename  )  [protected]

Definition at line 110 of file _SuffixArrayApplicationBase.cpp.

References corpus_list, corpusSize, level1BucketElement::first, level1Buckets, offset_list, sentIdStart, SIZE_ONE_READ, suffix_list, vocIdForCorpusEnd, vocIdForSentEnd, and vocIdForSentStart.

Referenced by loadData().


Field Documentation

TextLenType C_SuffixArrayApplicationBase::corpusSize [protected]

Definition at line 33 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem(), C_SuffixArraySearchApplicationBase::loadData_forSearch(), loadSuffix(), and returnCorpusSize().

bool C_SuffixArrayApplicationBase::noVocabulary [protected]

Definition at line 40 of file _SuffixArrayApplicationBase.h.

Referenced by C_SuffixArrayApplicationBase(), and loadData().

bool C_SuffixArrayApplicationBase::noOffset [protected]

Definition at line 41 of file _SuffixArrayApplicationBase.h.

Referenced by C_SuffixArrayApplicationBase(), and C_SuffixArraySearchApplicationBase::C_SuffixArraySearchApplicationBase().

bool C_SuffixArrayApplicationBase::noLevel1Bucket [protected]

Definition at line 42 of file _SuffixArrayApplicationBase.h.

Referenced by C_SuffixArrayApplicationBase(), and C_SuffixArraySearchApplicationBase::C_SuffixArraySearchApplicationBase().

C_IDVocabulary* C_SuffixArrayApplicationBase::voc [protected]

Definition at line 44 of file _SuffixArrayApplicationBase.h.

Referenced by C_SuffixArrayLanguageModel::C_SuffixArrayLanguageModel(), C_SuffixArrayScanningBase::initializeForScanning(), loadVoc(), C_SuffixArrayLanguageModel::returnVocId(), and ~C_SuffixArrayApplicationBase().

IndexType C_SuffixArrayApplicationBase::sentIdStart [protected]

Definition at line 45 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem(), C_SuffixArraySearchApplicationBase::loadData_forSearch(), loadSuffix(), and C_SuffixArraySearchApplicationBase::locateSendIdFromPos().

IndexType C_SuffixArrayApplicationBase::vocIdForSentStart [protected]

Reimplemented in C_SuffixArrayScanningBase.

Definition at line 46 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem().

IndexType C_SuffixArrayApplicationBase::vocIdForSentEnd [protected]

Reimplemented in C_SuffixArrayScanningBase.

Definition at line 47 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem().

IndexType C_SuffixArrayApplicationBase::vocIdForCorpusEnd [protected]

Reimplemented in C_SuffixArrayScanningBase.

Definition at line 48 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem().

IndexType* C_SuffixArrayApplicationBase::corpus_list [protected]

Definition at line 50 of file _SuffixArrayApplicationBase.h.

Referenced by C_SuffixArraySearchApplicationBase::comparePhraseWithTextWithLCP(), loadCorpusAndInitMem(), C_SuffixArraySearchApplicationBase::loadData_forSearch(), and C_SuffixArraySearchApplicationBase::locateSendIdFromPos().

unsigned char* C_SuffixArrayApplicationBase::offset_list [protected]

Definition at line 51 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem(), and C_SuffixArraySearchApplicationBase::locateSendIdFromPos().

TextLenType* C_SuffixArrayApplicationBase::suffix_list [protected]

Definition at line 52 of file _SuffixArrayApplicationBase.h.

Referenced by loadCorpusAndInitMem().

S_level1BucketElement* C_SuffixArrayApplicationBase::level1Buckets [protected]

Definition at line 54 of file _SuffixArrayApplicationBase.h.

Referenced by C_SuffixArrayApplicationBase(), C_SuffixArraySearchApplicationBase::C_SuffixArraySearchApplicationBase(), C_SuffixArraySearchApplicationBase::constructNgramSearchTable4SentWithLCP(), loadCorpusAndInitMem(), and C_SuffixArraySearchApplicationBase::locateSAPositionRangeForExactPhraseMatch().


The documentation for this class was generated from the following files:
Generated on Fri Jul 6 23:11:20 2007 for SALM by  doxygen 1.5.1