1

I am using Libsvm for outlier detection (from Java), but I need a probability estimate not just a label. I traced the code and found that this is not possible. In particular, in the function svm_predict_values(..) I see the following code:

if(model.param.svm_type == svm_parameter.ONE_CLASS)
        return (sum>0)?1:-1;
else
        return sum;

I understand that one-class SVM tries to estimate the support of some probability distribution given samples or data points from the "normal" class. Given a new data point, and given that the model has learned the support of the normal class distribution, can I get an estimate of the probability that a new data point is "normal" or an outlier?. It seems that this is not possible and that is why Libsvm thresholds the sum above and returns only a membership label, but I do not understand why. If it is possible to get a probability estimate from a one-class svm, I do not see how to do that in Libsvm after spending a lot of time reading the code.

The reason I went this rout is that I do not believe kernel density estimation would work well in a high dimensional setting, but maybe the svm is prone to the same issue.

Kai
  • 1,464
  • 4
  • 18
  • 31

1 Answers1

3

I understand that one-class SVM tries to estimate the support of some probability distribution given samples or data points from the "normal" class

The problem is this sentence is false for SVM. In general - yes, this would be a nice probabilistic approach to built a classifier, taken by models like logistic regression, neural nets, and many others. However, SVM is not one of them, there is no proper probabilistic interpretation of SVM, it does not really construct probability distribution but rather directly looks for a nice decision rule. There are more probabilistic alternatives, like Relevance Vector Machines (RVM), which are however, non-convex. The only reason why binary SVM can provide you with probability estimates is because there is a small "cheat" in many implementations, originated by Platt, where you simply fit another, probabilistic model on top of SVM - typically Logistic Regression on top of SVM projection.

So, what can you do? You can either go for other, more probabilistic model, or use similar cheat, and first project your data through SVM (this is what "sum" is in the code provided) and then fit Logistic Regression on top of it, which will be your probability estimate.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • ok, but libsvm provides this "cheat" for multi-class svm. So the question is why is it not provided for one-class svm? – Kai Dec 21 '16 at 16:22
  • there is no math reason for not using it in one-class SVM, so why it is not implemented? One reason might be the fact, that usually in one class classification probability estimate is about providing P(x|positive), and not P(positive|x) (which Platt scaling "cheat" gives you), and there is no nice way to convert between these quantities (as one class SVM does not have model of P(x)). But the more probable reason is simply "not enough people asking for it", since one-class SVM is a minor part of libsvm. – lejlot Dec 21 '16 at 17:32
  • ok, that's a good answer. Thank you. In my application I actually need p(positive | x), so the enhanced Platt estimate in libsvm -if it were provided for one-class- would have been perfect. I did find a workaround though, which may be the answer. Turns out that the parameter nu is essentially an upper bound on the ratio of training examples rejected. So a new test point will have to be "closer" to "normal" or positiveness than those rejected training examples (even though they belong to the positive class). This is not exactly a calibrated probability, but can act in lieu of p(positive | x). – Kai Dec 21 '16 at 17:44
  • @Kai fitting logistic regression would be a true probability and not much more complex to do than putting completely heuristic measure based on the closeness. In the end in order to have probability you would need to fit some distribution, and being able to say "the boundary goes more or less here" does not give you the equation for slope etc. Furthermore "nu" is as you said - the upper bound. It does not have to be hit, so you should test empirically for your data - how many are actually rejected (and now fitting LR is not more complicated than that :-)). – lejlot Dec 21 '16 at 20:17
  • you're right, although I don't know how to use a binary or multi-class classifier for a one-class problem, but I can research that. Also, the practical limitation I have is that I need to use an API (from Java), I don't want to implement one myself. I don't know of an API that will do what I want, p(positive | x), that's why I went to libsvm and one-class svm. I will look though, but please do point me to any you know of. – Kai Dec 21 '16 at 20:26
  • In order to fit LR all you have to do is have distances to hyperplane (the same thing your heuristics needs), since what I am saying is literally learn mapping from this distances to a class through LR (1 dimensional classification) re-using your training set. This way you have a valid probability estimate (this is literally Platt's scaling) and you do not need to "dig deep", you just need access to the distance from the hyperplane (and some training data to calibrate LR, it does not even have to be the original training set) – lejlot Dec 21 '16 at 20:28
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/131189/discussion-between-kai-and-lejlot). – Kai Dec 21 '16 at 20:31
  • I am currently working iwth OC SVM. I have only examples from one class and I would like give a probability instead of a number (-1, 1). You were talking about fitting a LR with the distances to the hyperplane, but how am I going to do that if I only have the positive class distances? Thank you!! – Ignacio Peletier Jul 16 '19 at 12:46