The Dragon Toolkit
Home Documentation Examples Demos Download Stats Contact
 

Example: Topic Modeling

The dragon toolkit implements several the-state-of-the-art topic models including the Aspect Model [1], the LDA model [2], and the simple mixture model [3].
Data Package

Click here to download the data package for the example. Please decompress the package and place these files under the working directory with the following structure:
workdir/indexisi/jasist.collection
workdir/indexisi/ml.collection
workdir/indexisi/topiccfg.xml

Dependency
Please download the external jar library for Java Excel and the external NLP data for Examples before running this example.
Run the example
  1. LDA Model
    java -mx1000000000 -oss10000000000 dragon.config.TopicModelAppConfig indexisi/topiccfg.xml 1
  2. Aspect Model
    java -mx1000000000 -oss10000000000 dragon.config.TopicModelAppConfig indexisi/topiccfg.xml 2
  3. Simple Mixture Model
    java -mx1000000000 -oss10000000000 dragon.config.TopicModelAppConfig indexisi/topiccfg.xml 3
References
  1. Hoffman, T., "Probabilistic latent semantic indexing," In the Proceedings of Uncertainty in Artificial Intelligence, 1999, pp. 50-57 [PDF]
  2. Blei, D., Ng, A., and Jordan, M., "Latent Dirichlet allocation," Journal of Machine Learning Research, 3:993-1022, January 2003 [PDF]
  3. Zhai, C., Velivelli, A., and Yu, B. "A cross-collection mixture model for comparative text mining," Proceedings of ACM KDD 2004 ( KDD'04 ), 2004, pp. 743-748 [PDF]