2

I'm new to scikit-learn, and SVM methods in general. I've got my data set working well with scikit-learn OneClassSVM in order to detect outliers; I train the OneClassSVM using observation all of which are 'inliers' and then use predict() to generate binary inlier/outlier predictions on my testing set of data.

However to continue further with my analysis I'd like to get the probabilities associated with each new observation in my test set. E.g. The probability of being an outlier associated with each new observation. I've noticed other classification methods in scikit-learn offer the ability to pass the parameter probability=True to compute this, but OneClassSVM does not offer this. Is there an easy way to get these results?

justin
  • 41
  • 5
  • hi , In know this is an old question , but I figured that you must have solved this or found a work around , I'm basically stuck at the same point you were in 3 years ago and I need the probabilities to calculate the AUC-ROC , I have posted this stack overflow question as well https://stackoverflow.com/questions/49931965/auc-roc-for-a-none-ranking-classifier-such-as-osvm – mousa alsulaimi Apr 20 '18 at 11:19

3 Answers3

1

I was searching for an answer for the same question of yours until I got to this page. Stuck for sometime, then, I went back to check the original LIBSVM package since OneClassSVM of scikit-learn is based on the implementation of LIBSVM as stated here.

At the main page of LIBSVM, they state the following for option '-b' that is used to activate returning probability output scores for some variants of SVM: -b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0) In other words, the one-class SVM which is of type SVM (neither SVC nor SVR) does not have implementation for probability estimation.

If I go and try to force this option (i.e. -b) using the command line interface of LIBSVM, for example: ./svm-train -s 2 -t 2 -b 1 heart_scale

I receive the following error message: ERROR: one-class SVM probability output not supported yet

In summary, this very desired output is not yet supported by LIBSVM and thus, scikit-learn is not offering it for the moment. I hope in near future, they activate this functionality and update the thread here.

soufanom
  • 396
  • 2
  • 12
1

It provides decision function scores which in theory is the distance from the marginal decision boundary between normal and anomales OCSVM does unsupervised classification. This means that the anomaly inside the algorithm is defined based on the distance to the origin (quoted from Scholkopf's paper from NIPS https://papers.nips.cc/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf).

TLDR: use

clf.decision_function(samples) * (-1)

as scores. you get a sparse distributiion of scores.

Dharman
  • 30,962
  • 25
  • 85
  • 135
partizanos
  • 1,064
  • 12
  • 22
0

Since version 3.31, libsvm supports probabilistic outputs for one-class SVM: https://www.csie.ntu.edu.tw/~cjlin/libsvm/#nuandone

Itsuarpok
  • 36
  • 2
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/34447569) – user16217248 May 31 '23 at 04:03