I am interested in a wide range of theoretical and practical issues in statistical
pattern recognition, probabilistic models and especially one-class
classification problems and
active learning
One-class classification
When in a classification problem only samples of one class are easily
accessible, this problem is called a one-class classification
problem. Many standard classifiers, like back-propagation neural
networks, fail on this data. Some other techniques, like the k-means
clustering or the nearest neighbor classifier can be applied after some
minor changes.
In the problem of one-class classification, one class of the data,
called the target set, has to be distinguished from all the other
possible objects, called outliers. This description should be
constructed such that objects not originating from the target set are
not accepted by the data description. It is assumed that almost no
examples of the outlier class are available.
In general, the problem of one-class classification is harder than the
problem of normal two-class classification. For normal classification
the decision boundary is supported from both sides by examples of each
of the classes. Because in the case of one-class classification only
one set of data is easily available, only one side of the boundary is
covered. On the basis of one class it is hard to decide how
tight the boundary should fit around the data in each of the
directions.
The absence of example outlier objects makes it also very hard to
estimate the error that the classifier makes. The error of the first
kind - the target objects that are classified as
outlier objects, can be estimated on the training set. The error of
the second kind - the outlier objects that will be
classified as target objects, can be estimated only by an assumption
on the distribution of the outliers in the evaluation set. As long
as we do not have example outlier objects available, we assume that the
outliers are uniformly distributed in the feature space. This directly
means, that when the chance of accepting an outlier object is
minimized, the volume covered by the one-class classifier in the
feature space should be minimized.
Using the uniform distribution for the outlier objects, implicitly
assumes that the objects are represented by 'good' features. This
means that outlier objects will be around the target class and
not inside it. When it appears that there is still some overlap
between the target objects and outlier objects, the representation of
the objects should be changed such that the distinction becomes
easier.
|
|