dragon.ir.classification.featureselection
Class AbstractFeatureSelector

java.lang.Object
  |
  +--dragon.ir.classification.featureselection.AbstractFeatureSelector
All Implemented Interfaces:
FeatureSelector, java.io.Serializable
Direct Known Subclasses:
ChiFeatureSelector, DocFrequencySelector, InfoGainFeatureSelector, MutualInfoFeatureSelector, NullFeatureSelector

public abstract class AbstractFeatureSelector
extends java.lang.Object
implements FeatureSelector, java.io.Serializable

Abstract function class for feature selection

Copyright: Copyright (c) 2005

Company: IST, Drexel University

Version:
1.0
Author:
Davis Zhou
See Also:
Serialized Form

Field Summary
protected  int[] featureMap
           
protected  int selectedFeatureNum
           
 
Constructor Summary
AbstractFeatureSelector()
           
 
Method Summary
protected  DoubleVector getClassPrior(DocClassSet docSet)
           
 int getSelectedFeatureNum()
           
protected abstract  int[] getSelectedFeatures(IndexReader indexReader, DocClassSet trainingSet)
           
protected abstract  int[] getSelectedFeatures(SparseMatrix doctermMatrix, DocClassSet trainingSet)
           
protected  IntDenseMatrix getTermDistribution(IndexReader indexReader, DocClassSet trainingSet)
           
protected  IntDenseMatrix getTermDistribution(SparseMatrix doctermMatrix, DocClassSet trainingSet)
           
protected  int[] getTermDocFrequency(SparseMatrix matrix, DocClassSet trainingSet)
           
 boolean isSelected(int originalFeatureIndex)
           
 int map(int originalFeatureIndex)
          Map the old feature index to the index in the new feature space.
 void setSelectedFeatures(int[] selectedFeatures)
          Manually set selected features.
 void train(IndexReader indexReader, DocClassSet trainingSet)
          This method chooses a subset of features for text classification
 void train(SparseMatrix doctermMatrix, DocClassSet trainingSet)
          This method chooses a subset of features for text classification.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

featureMap

protected int[] featureMap

selectedFeatureNum

protected int selectedFeatureNum
Constructor Detail

AbstractFeatureSelector

public AbstractFeatureSelector()
Method Detail

getSelectedFeatures

protected abstract int[] getSelectedFeatures(IndexReader indexReader,
                                             DocClassSet trainingSet)

getSelectedFeatures

protected abstract int[] getSelectedFeatures(SparseMatrix doctermMatrix,
                                             DocClassSet trainingSet)

train

public void train(IndexReader indexReader,
                  DocClassSet trainingSet)
Description copied from interface: FeatureSelector
This method chooses a subset of features for text classification

Specified by:
train in interface FeatureSelector
Parameters:
indexReader - the index reader a classifer is working on
trainingSet - the labeled training document set

train

public void train(SparseMatrix doctermMatrix,
                  DocClassSet trainingSet)
Description copied from interface: FeatureSelector
This method chooses a subset of features for text classification. Some implmentations such as DocFrequencySelector and InfoGainFeatureSelector do not support this method.

Specified by:
train in interface FeatureSelector
Parameters:
doctermMatrix - the document-term matrix a classifer is working on
trainingSet - the labeled training document set

setSelectedFeatures

public void setSelectedFeatures(int[] selectedFeatures)
Description copied from interface: FeatureSelector
Manually set selected features. Usually used in testing stage.

Specified by:
setSelectedFeatures in interface FeatureSelector
Parameters:
selectedFeatures - each elements contains the index of the selected feature in the old feature space. The selected feature must be in the ascending order in the input array.

isSelected

public boolean isSelected(int originalFeatureIndex)
Specified by:
isSelected in interface FeatureSelector
Parameters:
originalFeatureIndex - the index of the feature in the old feature space
Returns:
true if the given feature is selected for text classification

map

public int map(int originalFeatureIndex)
Description copied from interface: FeatureSelector
Map the old feature index to the index in the new feature space.

Specified by:
map in interface FeatureSelector
Parameters:
originalFeatureIndex - the index of the feature before feature selection
Returns:
the index of the feature the new space. If the feature is not selected, it will return -1.

getSelectedFeatureNum

public int getSelectedFeatureNum()
Specified by:
getSelectedFeatureNum in interface FeatureSelector
Returns:
the number of selected features

getClassPrior

protected DoubleVector getClassPrior(DocClassSet docSet)

getTermDocFrequency

protected int[] getTermDocFrequency(SparseMatrix matrix,
                                    DocClassSet trainingSet)

getTermDistribution

protected IntDenseMatrix getTermDistribution(IndexReader indexReader,
                                             DocClassSet trainingSet)

getTermDistribution

protected IntDenseMatrix getTermDistribution(SparseMatrix doctermMatrix,
                                             DocClassSet trainingSet)