The Dragon Toolkit
Home Documentation Examples Demos Download Stats Contact
 

Example: MaxMatcher

MaxMatcher is a biological concept extractor tool using dicitonary-based approximate matching. UMLS 2004AA version is used as the dictionary. The precision and recall on GENIA3.02 corpus are 71.60% and 75.18%, respectively.
Data Package

Click here to download the data package for the example. It includes following files:
maxmatcher/configure.xml
maxmatcher/PMID_7641692.txt
maxmatcher/src/dragon/textieprj/MaxMatcher.java
maxmatcher/src/dragon/textieprj/MaxMatcher.class

Dependency
Please download the external jar library for MedPost Tagger and external NLP data for POS Tagger, UMLS, English Lemmatiser and Examples before running this example.
Run the example
  1. Place the "maxmatcher" directory under the working directory
  2. Make sure MaxMatcher.class is on your classpath
  3. Type the command:
    java -mx1000000000 -oss10000000000 dragon.textieprj.MaxMatcher
  4. You will see the result file, PMID_7641692.term, under the "maxmatcher" directory
References
  1. Zhou X., Zhang X., Hu X., Using Concept-based Indexing to Improve Language Modeling Approach to Genomic, in the 28th European Conference on Information Retrieval (ECIR 2006), April 10-12, 2006, London, UK, pp. 444-455 [pdf]
  2. Zhou X., Hu X., Zhang X., Lin X., Song I-Y, Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR, accepted in the 29th Annual International ACM SIGIR Conference (SIGIR 2006), Aug 6-11, 2006, Seattle, WA, USA [pdf]
  3. Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu, "MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup", the 9th biennial The Pacific Rim International Conference on Artificial Intelligence (PRICAI 2006), Aug 9-11, 2006, Guilin, Guangxi, China, Page 1145-1149 [PDF]