Package dragon.ir.index

A package for doucument indexing and indexing result read.

See:
          Description

Interface Summary
Indexer Interface of document indexing agent
IndexReader Interface of index reader
IndexWriter Interface of index writer
IRDocIndexList Interface of IRDoc Index List
IRRelationIndexList Interface of IRRelation Index List
IRSignature Interface of IRSignature
IRSignatureIndexList Interface of IRSignature Index List
IRTermIndexList Interface of IRTerm Index List
 

Class Summary
AbstractIndexer AbstractIndexer implements basic functions of interface Indexer such as building index for an article and options for section index and relation index
AbstractIndexReader AbstractIndexReader implements functions defined in interface IndexReader such as getting termdoc and docterm matrix, getting termkey and dockey, getting termindex and docindex, getting all the indexed terms for a given document, getting all the indexed documents for a given term ...
AbstractIndexWriteController AbstractIndexWriteController implements basics functions including generating IRTermList, IRRelationList for the useage of later processing.
AbstractIndexWriter The class implements two methods writing termdoc and docterm matrix to disk with options of term based, relation based or both
BasicIndexer BasicIndexer, providing all basic functions for indexing, can be used to index documents directly
BasicIndexReader BasicIndexReader is used to read all the indexing information into memory for a indexed collection
BasicIndexWriteController This class implements function of writing section index.
BasicIndexWriter The class is used to initialize and write index to disk
BasicIRDocIndexList The class is used to write or load the indexing information for a given document or document set
BasicIRRelationIndexList The class is used to write or load the relation indexing information for a given IR relation or relation set
BasicIRTermIndexList The class is used to write or load the relation indexing information for a given IR term or term set
DualIndexer The class handles two dimensional document language model indexing
FileIndex The class is used to read according files storing indexing information.
IndexConverter Import (export) indexing from (to) text format
IRCollection IRCollection is the data structure for a document collection
IRDoc IRDoc is data structure for IR document indexing which can be sorted and thus compared by weight and index
IRRelation IRRelation is the basic data structure for binary relation extracted from document which can be sorted and compared by index and frequency.
IRSection IRCollection is the data structure for document section indexing
IRTerm This is basic indexing unit which can be sorted and compared by index and frequency
OnlineIndexer The class is designed for indexing in computer memory especially for small document set when writing to disk is not necessary.
OnlineIndexReader The class is used to read indexing information from memory
OnlineIndexWriteController The class is the same as BasicIndexWriteController except that it does not store information to disk
OnlineIndexWriter The class is the same as BasicIndexWriter except that it does not store information to disk.
OnlineIRDocIndexList The class is used to add or get a given IRDoc
OnlineIRRelationIndexList The class is used to add or get a given IR relation
OnlineIRTermIndexList The class is used to add or get a given IR term
TransposeIRMatrix The class is used to transpose a given IR indexing matrix such as termdoc to docterm
 

Package dragon.ir.index Description

A package for doucument indexing and indexing result read.

Package Specification

There are two important interfaces. One is indexer which index articles in a corpus. The other is index reader which read out indexing results. The dragon toolkit supports two modes of indexing. The first mode saves the indexing results into disk-based files and ususally fit for large collections. The second mode keeps all information in the memory. The second mode is very conveient and fast, but for small collections only, e.g. collections for text summarization.