| |
The Dragon ToolKit (Version 1.3.3 2008/01/16)
|
| Designed for Text Retrieval and Text Mining |
The Dragon Toolkit is a Java-based development package for academic
use in information retrieval (IR) and text mining (TM, including
text classification, text clustering, text summarization, and topic
modeling). It is tailored for researchers who work on large-scale
IR and TM and prefer Java programming. Moreover, different from
Lucene and Lemur, it provides built-in supports for semantic-based
IR and TM. The dragon toolkit seamlessly integrates a set of NLP
tools, which enable the toolkit to index text collections with various
representation schemes including words, phrases, ontology-based
concepts and relationships. However, to minimize the learning time,
we intentionally keep the package small and simple. The toolkit
does not have some features including distributed IR and cross-language
IR which is a part of Lemur toolkit.
Another important feature of the toolkit is its scalability. Unlike
many text mining tools such as Weka, the dragon toolkit is specially
designed for large-scale application. The toolkit uses sparse matrix
to implement text representations and does not have to load all
data into memory in the running time. Therefore, it can handle hundred
thousands of documents with very limited memory.
A Full List of Functional Features |
| The Personnel |
|
Supervisor of the Dragon Project:
Xiaohua (Tony)
Hu, thu@ischool.drexel.edu
System Design and Development:
Xiaohua (Davis) Zhou, xiaohua.zhou@drexel.edu
Xiaodan (Tom) Zhang,
xiaodan.zhang@drexel.edu
|
| How to Cite Dragon Toolkit |
|
If you are using the Dragon Toolkit for research work, please
cite it in your published papers:
Zhou, X., Zhang, X., and Hu, X., "Dragon Toolkit: Incorporating
Auto-learned Semantic Knowledge into Large-Scale Text Retrieval
and Mining," In proceedings of the 19th IEEE International
Conference on Tools with Artificial Intelligence (ICTAI),
October 29-31, 2007, Patras, Greece [PDF]
|
|