The Dragon Toolkit
Home Documentation Examples Demos Download Stats Contact
 

Example: Text Retrieval Using Multiword Phrases

The Dragon Toolkit is initially designed for language modeling approach to information retrieval. Each search task can be viewed as a combination of a searching component, a smoothing component, and a feedback component. The current version includes a large number of smoothing components and feedback components reported in liteature covering langauge models, traditional probabilistic models, and vector space models. The example shown below is from [1] which use multiword phrases to smooth query language models and document language models.
Data Package

Download the AP89 collection and configuration files for the example. Please decompress the package and place these files under the working directory with the following structure:
workdir/indextrec/ap89/AP
workdir/indextrec/ap89/topics.txt
workdir/indextrec/ap89/queryporter.txt
workdir/indextrec/ap89/queryporterexapnd.txt
workdir/indextrec/ap89/queryporterexapndw.txt
workdir/indextrec/ap89/indexcfg.xml
workdir/indextrec/ap89/wordircfg.xml
workdir/indextrec/ap89/querycfg.xml
workdir/indextrec/ap89/judgments.txt

Dependency
Please download the external jar library for Hepple Tagger and external NLP data for POS Tagger, English Lemmatiser and Examples before running this example.
Run the example
    Indexing
  1. run the command below to do word-based indexing
    java -mx1000000000 -oss10000000000 dragon.config.IndexAppConfig indextrec/ap89/indexcfg.xml 1
  2. run the command below to do phrase-based indexing
    java -mx1000000000 -oss10000000000 dragon.config.IndexAppConfig indextrec/ap89/indexcfg.xml 2
  3. run the command below to do phrase-word mapping for query expansion purpose
    java -mx1000000000 -oss10000000000 dragon.config.TranslationAppConfig indextrec/genomic04/indexcfg.xml 1
  4. run the command below to do phrase-word mapping for document expansion purpose
    java -mx1000000000 -oss10000000000 dragon.config.TranslationAppConfig indextrec/genomic04/indexcfg.xml 2
  5. run the command below to generate word-word co-occurrence matrix
    java -mx1500000000 -oss15000000000 dragon.config.CooccurrenceAppConfig indextrec/genomic04/indexcfg.xml 1
  6. run the command below to do word-word mapping
    java -mx1000000000 -oss10000000000 dragon.config.TranslationAppConfig indextrec/genomic04/indexcfg.xml 3

    Query Generation
  7. Generate basic query
    java -mx1000000000 -oss10000000000 dragon.config.QueryAppConfig indextrec/ap89/querycfg.xml 1
  8. Expand query by phrase-word mapping
    java -mx1000000000 -oss10000000000 dragon.config.QueryAppConfig indextrec/ap89/querycfg.xml 2
  9. Expand query by word-word mapping
    java -mx1000000000 -oss10000000000 dragon.config.QueryAppConfig indextrec/ap89/querycfg.xml 3

    Word-based Retrieval
  10. Retrieval with Okapi Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 1
  11. Retrieval with Two-stage Language Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 2
  12. Retrieval with Two-stage Language Model and Expanded Query (Phrase-word Mapping)
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 3
  13. Retrieval with Context-sensitive Semantic Smoothing (CSSS) Langauge Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 4
  14. Retrieval with Two-stage Language Model and Generative Feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 5
  15. Retrieval with Two-stage Language Model and Phrase-translation Feedback
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 6
  16. Retrieval with Context-insensitive Semantic Smoothing (CSSS) Langauge Model
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 7
  17. Retrieval with Two-stage Language Model and Expanded Query (Word-word Mapping)
    java -mx1000000000 -oss10000000000 dragon.config.RetrievalEvaAppConfig indextrec/ap89/wordircfg.xml 8

References
  1. Zhou X., Hu X., and Zhang X., "Topic Signature Language Model for Ad hoc Retrieval," accepted by IEEE Transaction on Knowledge and Data Engineering (TKDE)