Abstract class for concept extraction which is the super class of AbstractPhraseExtractor,
AbstractTermExtractor, AbstractTokenExtrator, and AbstractTripleExtractor
Abstract flat sparse matrix handles sparse matrix smaller than super sparse matrix while adding options for read and store
matrix data to disk either in text or binary format
The abstract sparse matrix for handling extreme large sparse matrirwhich will write matrix data to disk whenever
it's over fulsh interval by default 1000,000 and superior to AbstractSuperSparseMatrix, however it's lack of some
basic matrix operation functions such as getNonZeroColumnInRow privided by AbstractSuperSparseMatrix because it focuses
on storing and loading matrix to disk efficiently
AbstractIndexReader implements functions defined in interface IndexReader such as getting
termdoc and docterm matrix, getting termkey and dockey, getting termindex and docindex,
getting all the indexed terms for a given document, getting all the indexed documents for a given term
...
Abstract super sparse matrix is designed for large sparse matrix which first caches data and then processes data
and write data to disk when it's over flush interval
Double flat sparse matrix handles data smaller than super sparse matrix, however it provides options
of storing data to disk either in binary or text format
if the original label set is A (0), B (1), C (2), D (3) and markov order is 2, the new label set will be:
AA, BA, CA, DA, AB, BB, CB, DB, AC, BC, CC, DC, DA, DB, DC, DD
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Randomly spit the documents in the answerKeyFile into training set and testing set
according to the parameter of percentage and then evaluate the classifier.
This can be used as a wrapper around a FeatureType class that wants to
generate features which take into account the normalized distance of the token to
the start of a segement.