Package dragon.nlp

Define data structures used for natural language processing

See:
          Description

Interface Summary
Concept Interface of Concept which could be a single word, a multiword phrase, or an ontological term
DocumentParser Parse a text document into paragraphs, sentences and words
 

Class Summary
Counter A light integer counter
Document Data structure for document
Paragraph Data structure for paragraph of document
Phrase Data structure for phrase
Sentence Data structure for sentence extracted from text
SimpleElement This is a light data structor for data element that can be used for simple pair
SimpleElementList The list data structure for simple element data
SimplePair This is a light data structure for pair data
SimplePairList Simple pair list is the list for pair data elements
Term Term is designed for Universal Medical Langauge System (UMLS) medical concept that can stores n-grams and its semantic and statistical information
Token Token is a cute unit data structure for handling nlp task or other operations
Triple Triple is data structure for binary relation extracted from text
Word This is a supporting class for Term and Phrase class which can contain multi words
 

Package dragon.nlp Description

Define data structures used for natural language processing

Package Specification

Document, Paragraph, Sentence and Word are designed for text parsing and tokenization during natural language processing. Three different type of concepts are implemented for different use. A token often corresponds to a single word. A phrase consists of mutilple adjacent words. A term denotes an ontological concepts. Thus it can consist of multiple adjacent words as does a phrase. Moreover, it often has a unique entry ID defined in the domain ontology.