The Dragon Toolkit
Home Documentation Examples Demos Download Stats Contact
 

Example: Text Summarization

The dragon toolkit supports generic multi-document summarization. The summarizer should implement the interface called Generic Multi-Document Summarizer. Given a collection of reader, the summarizer should return a summary with given maximum length. To evaluate the quality of the machine summary, please call ROUGE. The dragon toolkit has implemented the N-gram metric of ROUGE.
Data Package

Click here to download the data package for the example. Please decompress the package and place these files under the working directory with the following structure:
workdir/evasum/duc2004_2/models
workdir/evasum/duc2004_2/testdata
workdir/evasum/duc2004_2/sumcfg.xml

Dependency
Please download the external NLP data for ROUGE and Examples before running this example.
Run the example
  1. Use LexRank to run DUC 2004 Task 2 (evaluated by ROUGE-1 and ROUGE-2)
    java -mx1000000000 -oss10000000000 dragon.config.SummerizationEvaAppConfig evasum/duc2004_2/sumcfg.xml 1
References
  1. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization