The Dragon Toolkit
Home Documentation Examples Demos Download Stats Contact
 

Example: Text Retrieval Using Ontology

The Dragon Toolkit is initially designed for language modeling approach to information retrieval. Each search task can be viewed as a combination of a searching component, a smoothing component, and a feedback component. The current version includes a large number of smoothing components and feedback components reported in liteature covering langauge models, traditional probabilistic models, and vector space models. The example shown below is from [1]. It is alerted that results may be slightly different from ones reported in [1] because we modified the indexing component slightly after publishing the paper.
Data Package

Download the genomics collection and configuration files for the example. Please decompress the package and place these files under the working directory with the following structure:
workdir/indextrec/genomic04/genomic04.collection
workdir/indextrec/genomic04/genomic04.index
workdir/indextrec/genomic04/query_word.txt
workdir/indextrec/genomic04/query_concept.txt
workdir/indextrec/genomic04/indexcfg.xml
workdir/indextrec/genomic04/wordircfg.xml
workdir/indextrec/genomic04/conceptircfg.xml
workdir/indextrec/genomic04/judgments.txt

Dependency
Please download the external jar library for MedPost Tagger and external NLP data for POS Tagger, UMLS, English Lemmatiser and Examples before running this example.
Run the example
    Indexing
  1. run the command below to do word-based indexing (about 12 minutes)
    java -mx1000000000 -oss10000000000 dragon.config.IndexAppConfig indextrec/genomic04/indexcfg.xml 1
  2. run the command below to do concept-based indexing (about 2 hours)
    java -mx1000000000 -oss10000000000 dragon.config.IndexAppConfig indextrec/genomic04/indexcfg.xml 2
  3. run the command below to do relationship-term mapping (about 10 minutes)
    java -mx1000000000 -oss10000000000 dragon.config.TranslationAppConfig indextrec/genomic04/indexcfg.xml 1
  4. run the command below to do concept-word mapping (about 10 minutes)
    java -mx1000000000 -oss10000000000 dragon.config.TranslationAppConfig indextrec/genomic04/indexcfg.xml 3
  5. run the command below to generate word-word co-occurrence matrix
    java -mx1500000000 -oss15000000000 dragon.config.CooccurrenceAppConfig indextrec/genomic04/indexcfg.xml 1
  6. run the command below to do word-word mapping (about 10 minutes)
    java -mx1000000000 -oss10000000000 dragon.config.TranslationAppConfig indextrec/genomic04/indexcfg.xml 2

  7. Word-based Retrieval
  8. Retrieval with Two-stage Language Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/wordircfg.xml 1
  9. Retrieval with Okapi Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/wordircfg.xml 2
  10. Retrieval with two-stage language model and generative feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/wordircfg.xml 3
  11. Retrieval with JM Smoothing and Information Flow Feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/wordircfg.xml 4
  12. Retrieval with Context-insensitive Semantic Smoothing (CSSS) Language Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/wordircfg.xml 5
  13. Retrieval with Context-sensitive Semantic Smoothing (CSSS) Language Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/wordircfg.xml 6

    Concept-based Retrieval
  14. Retrieval with JM Smoothing
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/conceptircfg.xml 1
  15. Retrieval with Okapi Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/conceptircfg.xml 2
  16. Retrieval with Topic Signature Translation Smoothing
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/conceptircfg.xml 3
  17. Retrieval with JM Smoothing and Generative Feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/conceptircfg.xml 4
  18. Retrieval with JM Smoothing and Relation-Translation Feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/conceptircfg.xml 5
  19. Retrieval with Topic Signature Translation Smoothing and Relation-Translation Feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/genomic04/conceptircfg.xml 6
References
  1. Zhou X., Hu X., Zhang X., Lin X., and Song, I.-Y., Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR, accepted in the 29th Annual International ACM SIGIR Conference (SIGIR 2006), Aug 6-11, 2006, Seattle, WA, USA [pdf]
  2. Zhou X., Hu X., and Zhang X., "Topic Signature Language Model for Ad hoc Retrieval," accepted by IEEE Transaction on Knowledge and Data Engineering (TKDE)