How can i know probability of class predicted by predict() function in Support Vector Machine?

Question

How can i know sample's probability that it belongs to a class predicted by predict() function of Scikit-Learn in Support Vector Machine?

>>>print clf.predict([fv])
[5]

There is any function?

score 37 · Answer 1 · edited May 23 '17 at 12:17

Definitely read this section of the docs as there's some subtleties involved. See also Scikit-learn predict_proba gives wrong answers

Basically, if you have a multi-class problem with plenty of data predict_proba as suggested earlier works well. Otherwise, you may have to make do with an ordering that doesn't yield probability scores from decision_function.

Here's a nice motif for using predict_proba to get a dictionary or list of class vs probability:

model = svm.SVC(probability=True)
model.fit(X, Y)
results = model.predict_proba(test_data)[0]

# gets a dictionary of {'class_name': probability}
prob_per_class_dictionary = dict(zip(model.classes_, results))

# gets a list of ['most_probable_class', 'second_most_probable_class', ..., 'least_class']
results_ordered_by_probability = map(lambda x: x[0], sorted(zip(model.classes_, results), key=lambda x: x[1], reverse=True))

Bastiaan van den Berg · Accepted Answer · 2013-02-22T12:57:37.233

21

Use clf.predict_proba([fv]) to obtain a list with predicted probabilities per class. However, this function is not available for all classifiers.

Regarding your comment, consider the following:

>> prob = [ 0.01357713, 0.00662571, 0.00782155, 0.3841413, 0.07487401, 0.09861277, 0.00644468, 0.40790285]
>> sum(prob)
1.0

The probabilities sum to 1.0, so multiply by 100 to get percentage.

edited Feb 22 '13 at 12:57

answered Feb 22 '13 at 07:30

Bastiaan van den Berg

1,585
1
14
20

score 14 · Answer 3 · answered Feb 22 '13 at 08:26

14

When creating SVC class to compute the probability estimates by setting probability=True:

http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Then call fit as usual and then predict_proba([fv]).

answered Feb 22 '13 at 08:26

ogrisel

39,309
12
116
125

It returns: predicted values array "[[ 0.01357713 0.00662571 0.00782155 0.3841413 0.07487401 0.09861277 0.00644468 0.40790285]]" not a probability, like: class 8: 80%,class 4: 40% – postgres Feb 22 '13 at 12:00
4

Well this is exactly what you are looking for: 40% for class 7 (assuming the first class is "class 0"), 38% for class 3, 10% for class 5 and 7% for class 4. – ogrisel Feb 24 '13 at 14:59

score 2 · Answer 4 · answered Mar 29 '18 at 03:44

For clearer answers, I post again the information from scikit-learn for svm.

Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.) Platt’s method is also known to have theoretical issues. If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.

For other classifiers such as Random Forest, AdaBoost, Gradient Boosting, it should be okay to use predict function in scikit-learn.

karteek menda · Answer 5 · 2022-05-30T03:56:58.437

0

This is one way of obtaining the Probabilities

svc = SVC(probability=True)

preds_svc = svc.fit(X_train, y_train).predict(X_test)

probs_svc = svc.decision_function(X_test)#The decision function tells us on which side of the hyperplane generated by the classifier we are (and how far we are away from it).

probs_svc = (probs_svc - probs_svc.min()) / (probs_svc.max() - probs_svc.min())

edited May 30 '22 at 03:56

answered May 30 '22 at 03:56

karteek menda

1
1
3

probs_svc.min() and probs_svc.max () will keep on changing in each inference, hence the computed probability score are not consistent , it is recommended to fix a min and max value and then compute the score in this manner – Argho Chatterjee Mar 29 '23 at 11:21

How can i know probability of class predicted by predict() function in Support Vector Machine?

5 Answers5