#include <_SuffixArrayApplicationBase.h>
Inheritance diagram for C_SuffixArrayApplicationBase:

Public Member Functions | |
| void | loadData (const char *fileNameStem, bool noVoc, bool noOffset, bool noLevel1Bucket) |
| TextLenType | returnCorpusSize () |
| C_SuffixArrayApplicationBase () | |
| virtual | ~C_SuffixArrayApplicationBase () |
Protected Member Functions | |
| void | loadVoc (const char *filename) |
| void | loadOffset (const char *filename) |
| void | loadSuffix (const char *filename) |
| void | loadCorpusAndInitMem (const char *filename) |
Protected Attributes | |
| TextLenType | corpusSize |
| bool | noVocabulary |
| bool | noOffset |
| bool | noLevel1Bucket |
| C_IDVocabulary * | voc |
| IndexType | sentIdStart |
| IndexType | vocIdForSentStart |
| IndexType | vocIdForSentEnd |
| IndexType | vocIdForCorpusEnd |
| IndexType * | corpus_list |
| unsigned char * | offset_list |
| TextLenType * | suffix_list |
| S_level1BucketElement * | level1Buckets |
Definition at line 23 of file _SuffixArrayApplicationBase.h.
| C_SuffixArrayApplicationBase::C_SuffixArrayApplicationBase | ( | ) |
Revision
Definition at line 18 of file _SuffixArrayApplicationBase.cpp.
References level1Buckets, noLevel1Bucket, noOffset, and noVocabulary.
| C_SuffixArrayApplicationBase::~C_SuffixArrayApplicationBase | ( | ) | [virtual] |
| void C_SuffixArrayApplicationBase::loadData | ( | const char * | fileNameStem, | |
| bool | noVoc, | |||
| bool | noOffset, | |||
| bool | noLevel1Bucket | |||
| ) |
Load the indexed corpus, suffix array, vocabulary, offset into memory for follow up applications It is optional to load vocabulary, offset depends on the argument. In the case when the testing data shares the same vocabulary as the training data and only vocIDs are used to represent the sentence/n-grams then vocabulary which maps between vocId and the word text can be skipped to save some memory.
If the suffix array object does not need to locate the sentence id of an occurred n-gram, then offset information is not needed.
Be very careful here, the suffix array class does not check if offset has been loaded in the search function to make it efficient you need to know what the suffix array class will be used (whether offset is needed) and load it properly
| fileNameStem | The filename of the corpus. This should be the same filename used in IndexSA | |
| noVoc | If set to be 'true', vocabulary will not be loaded | |
| noOffset | If set to be 'true', the offset information will not be loaded. <sentId, offsetInSent> information for an n-gram's occurrences can not be calculated. | |
| noLevel1Bucket | Level1Bucket is used to speed up the search at the cost of additional memory. For applications which do not need to locate n-grams in the corpus (such as the corpus scanning application), then there is no need to create Level1Bucket |
Definition at line 60 of file _SuffixArrayApplicationBase.cpp.
References loadCorpusAndInitMem(), loadOffset(), loadSuffix(), loadVoc(), and noVocabulary.
Referenced by C_SuffixArrayScanningBase::C_SuffixArrayScanningBase(), and C_SuffixArraySearchApplicationBase::loadData_forSearch().
| TextLenType C_SuffixArrayApplicationBase::returnCorpusSize | ( | ) |
| void C_SuffixArrayApplicationBase::loadVoc | ( | const char * | filename | ) | [protected] |
Definition at line 105 of file _SuffixArrayApplicationBase.cpp.
References voc.
Referenced by loadData().
| void C_SuffixArrayApplicationBase::loadOffset | ( | const char * | filename | ) | [protected] |
Definition at line 264 of file _SuffixArrayApplicationBase.cpp.
References SIZE_ONE_READ.
Referenced by loadData().
| void C_SuffixArrayApplicationBase::loadSuffix | ( | const char * | filename | ) | [protected] |
Definition at line 186 of file _SuffixArrayApplicationBase.cpp.
References corpusSize, sentIdStart, and SIZE_ONE_READ.
Referenced by loadData().
| void C_SuffixArrayApplicationBase::loadCorpusAndInitMem | ( | const char * | filename | ) | [protected] |
Definition at line 110 of file _SuffixArrayApplicationBase.cpp.
References corpus_list, corpusSize, level1BucketElement::first, level1Buckets, offset_list, sentIdStart, SIZE_ONE_READ, suffix_list, vocIdForCorpusEnd, vocIdForSentEnd, and vocIdForSentStart.
Referenced by loadData().
TextLenType C_SuffixArrayApplicationBase::corpusSize [protected] |
Definition at line 33 of file _SuffixArrayApplicationBase.h.
Referenced by loadCorpusAndInitMem(), C_SuffixArraySearchApplicationBase::loadData_forSearch(), loadSuffix(), and returnCorpusSize().
bool C_SuffixArrayApplicationBase::noVocabulary [protected] |
Definition at line 40 of file _SuffixArrayApplicationBase.h.
Referenced by C_SuffixArrayApplicationBase(), and loadData().
bool C_SuffixArrayApplicationBase::noOffset [protected] |
Definition at line 41 of file _SuffixArrayApplicationBase.h.
Referenced by C_SuffixArrayApplicationBase(), and C_SuffixArraySearchApplicationBase::C_SuffixArraySearchApplicationBase().
bool C_SuffixArrayApplicationBase::noLevel1Bucket [protected] |
Definition at line 42 of file _SuffixArrayApplicationBase.h.
Referenced by C_SuffixArrayApplicationBase(), and C_SuffixArraySearchApplicationBase::C_SuffixArraySearchApplicationBase().
C_IDVocabulary* C_SuffixArrayApplicationBase::voc [protected] |
Definition at line 44 of file _SuffixArrayApplicationBase.h.
Referenced by C_SuffixArrayLanguageModel::C_SuffixArrayLanguageModel(), C_SuffixArrayScanningBase::initializeForScanning(), loadVoc(), C_SuffixArrayLanguageModel::returnVocId(), and ~C_SuffixArrayApplicationBase().
IndexType C_SuffixArrayApplicationBase::sentIdStart [protected] |
Definition at line 45 of file _SuffixArrayApplicationBase.h.
Referenced by loadCorpusAndInitMem(), C_SuffixArraySearchApplicationBase::loadData_forSearch(), loadSuffix(), and C_SuffixArraySearchApplicationBase::locateSendIdFromPos().
Reimplemented in C_SuffixArrayScanningBase.
Definition at line 46 of file _SuffixArrayApplicationBase.h.
Referenced by loadCorpusAndInitMem().
Reimplemented in C_SuffixArrayScanningBase.
Definition at line 47 of file _SuffixArrayApplicationBase.h.
Referenced by loadCorpusAndInitMem().
Reimplemented in C_SuffixArrayScanningBase.
Definition at line 48 of file _SuffixArrayApplicationBase.h.
Referenced by loadCorpusAndInitMem().
IndexType* C_SuffixArrayApplicationBase::corpus_list [protected] |
unsigned char* C_SuffixArrayApplicationBase::offset_list [protected] |
Definition at line 51 of file _SuffixArrayApplicationBase.h.
Referenced by loadCorpusAndInitMem(), and C_SuffixArraySearchApplicationBase::locateSendIdFromPos().
TextLenType* C_SuffixArrayApplicationBase::suffix_list [protected] |
Definition at line 54 of file _SuffixArrayApplicationBase.h.
Referenced by C_SuffixArrayApplicationBase(), C_SuffixArraySearchApplicationBase::C_SuffixArraySearchApplicationBase(), C_SuffixArraySearchApplicationBase::constructNgramSearchTable4SentWithLCP(), loadCorpusAndInitMem(), and C_SuffixArraySearchApplicationBase::locateSAPositionRangeForExactPhraseMatch().
1.5.1