dragon.ir.classification
Class AbstractClassifier

java.lang.Object
  |
  +--dragon.ir.classification.AbstractClassifier
All Implemented Interfaces:
Classifier
Direct Known Subclasses:
LibSVMClassifier, NBClassifier, SVMLightClassifier

public abstract class AbstractClassifier
extends java.lang.Object
implements Classifier

Basic function class for classifying

Copyright: Copyright (c) 2005

Company: IST, Drexel University

Version:
1.0
Author:
Davis Zhou

Field Summary
protected  java.lang.String[] arrLabel
           
protected  int classNum
           
protected  SparseMatrix doctermMatrix
           
protected  FeatureSelector featureSelector
           
protected  IndexReader indexReader
           
protected  DocClassSet validatingDocSet
           
 
Constructor Summary
AbstractClassifier()
           
AbstractClassifier(IndexReader indexReader)
           
AbstractClassifier(SparseMatrix doctermMatrix)
           
 
Method Summary
 DocClassSet classify(DocClass testingDocs)
          This method uses the trained model to classify the testing documents.
 DocClassSet classify(DocClassSet trainingDocSet, DocClass testingDocs)
          This method trains the classifier with the training document set and then using the trained model to classify the testing documents.
 DocClassSet classify(DocClassSet trainingDocSet, DocClassSet validatingDocSet, DocClass testingDocs)
           
 int classify(IRDoc doc)
          Classify one particular document
 java.lang.String getClassLabel(int index)
          Gets the label of a given document category
 SparseMatrix getDocTermMatrix()
           
 FeatureSelector getFeatureSelector()
           
 IndexReader getIndexReader()
           
protected  Row getRow(int docIndex)
           
 void setFeatureSelector(FeatureSelector selector)
           
 void train(DocClassSet trainingDocSet, DocClassSet validatingDocSet)
          This method trains the classifier with the training document set and validating document set.
protected  void trainFeatureSelector(DocClassSet trainingSet)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface dragon.ir.classification.Classifier
classify, rank, saveModel, train
 

Field Detail

indexReader

protected IndexReader indexReader

doctermMatrix

protected SparseMatrix doctermMatrix

validatingDocSet

protected DocClassSet validatingDocSet

featureSelector

protected FeatureSelector featureSelector

arrLabel

protected java.lang.String[] arrLabel

classNum

protected int classNum
Constructor Detail

AbstractClassifier

public AbstractClassifier(IndexReader indexReader)

AbstractClassifier

public AbstractClassifier(SparseMatrix doctermMatrix)

AbstractClassifier

public AbstractClassifier()
Method Detail

getClassLabel

public java.lang.String getClassLabel(int index)
Description copied from interface: Classifier
Gets the label of a given document category

Specified by:
getClassLabel in interface Classifier
Parameters:
index - the index of the category
Returns:
the label of the category

getIndexReader

public IndexReader getIndexReader()
Specified by:
getIndexReader in interface Classifier
Returns:
the index reader the classifier iw working on

getDocTermMatrix

public SparseMatrix getDocTermMatrix()

getFeatureSelector

public FeatureSelector getFeatureSelector()
Specified by:
getFeatureSelector in interface Classifier
Returns:
the feature selector the current classifier is using

setFeatureSelector

public void setFeatureSelector(FeatureSelector selector)
Specified by:
setFeatureSelector in interface Classifier
Parameters:
selector - the feature selector for the classifier.

classify

public DocClassSet classify(DocClassSet trainingDocSet,
                            DocClass testingDocs)
Description copied from interface: Classifier
This method trains the classifier with the training document set and then using the trained model to classify the testing documents.

Specified by:
classify in interface Classifier
Parameters:
trainingDocSet - training document set
testingDocs - testing document set
Returns:
classified testing document set

classify

public DocClassSet classify(DocClassSet trainingDocSet,
                            DocClassSet validatingDocSet,
                            DocClass testingDocs)
Specified by:
classify in interface Classifier
Parameters:
trainingDocSet - the training document set
validatingDocSet - the validation document set, usually for avoiding the overfitting problem
testingDocs - the testing document set
Returns:
classified testing document set

train

public void train(DocClassSet trainingDocSet,
                  DocClassSet validatingDocSet)
Description copied from interface: Classifier
This method trains the classifier with the training document set and validating document set.

Specified by:
train in interface Classifier
Parameters:
trainingDocSet - training document set
validatingDocSet - validating document set

classify

public DocClassSet classify(DocClass testingDocs)
Description copied from interface: Classifier
This method uses the trained model to classify the testing documents. The train method should be called before calling this method.

Specified by:
classify in interface Classifier
Parameters:
testingDocs - testing document set
Returns:
classified testing document set

classify

public int classify(IRDoc doc)
Description copied from interface: Classifier
Classify one particular document

Specified by:
classify in interface Classifier
Parameters:
doc - the index of the document is stored in the IRDoc object
Returns:
the index of the category of this document. The index starts from zero.

getRow

protected Row getRow(int docIndex)

trainFeatureSelector

protected void trainFeatureSelector(DocClassSet trainingSet)