Package dragon.ir.clustering

A package for document clustering and its evaluation

See:
          Description

Interface Summary
Clustering Interface of text clustering method
 

Class Summary
AbstractClustering Abstract class for document clustering
BasicKMean Basic KMeans clustering
BisectKMean BisectKMean clustering
ClusteringEva Class for evaluating clustering results
DocCluster Data structure for document cluster
DocClusterSet Data structure for a set of document clusters.
HierClustering Hierarchical clustering with options of single, complete, and average linkage
LinkKMean Link-based K-Means Clustering Algorithm
 

Package dragon.ir.clustering Description

A package for document clustering and its evaluation

Package Specification

The toolkit implements two common clustering approaches, the agglomerative approach and the K-Means approach. These two approaches have many variants in terms of similarity measures. The toolkit encapsulates the details of different similarity measures into the implementations of two interfaces, Doc Distance and Cluster Model, respectively. The Doc Distance interface computes the distance between two documents and is designed for agglomerative clustering approaches. The Cluster Model interface computes the distance between a document and a cluster or the generative probability of a document by a cluster model. To evaluate cluster quality, please call dragon.ir.clustering.ClusteringEva.