-1

I'm using OneVsRestClassifier on a multiclass problem with svm.SVC as the base estimator. The argmax from the predict_proba() does not match the predicted class:

sample

Is there some normalization going on in the background? How do I get predict_proba() and predict() to match?

1 Answers1

1

According to the scikit learn's SVC documentation on multi-class classification, there can be discrepancies between the output of predict and the argmax of predict_proba (emphasis mine):

The decision_function method of SVC and NuSVC gives per-class scores for each sample (or a single score per sample in the binary case). When the constructor option probability is set to True, class membership probability estimates (from the methods predict_proba and predict_log_proba) are enabled. In the binary case, the probabilities are calibrated using Platt scaling: logistic regression on the SVM’s scores, fit by an additional cross-validation on the training data. In the multiclass case, this is extended as per Wu et al. (2004).

Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.) Platt’s method is also known to have theoretical issues. If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.

You cannot get them to match using a SVC. You can try another model if you need the probabilities. If you do not need probabilities, as stated in the documentation, you can use decision_function (see here for more details.)

Community
  • 1
  • 1
FlorianGD
  • 2,336
  • 1
  • 15
  • 32
  • Thank you. I ended up using another model, but CalibratedClassifierCV is also an option if you want SVC and the probabilities – UNIrene Oct 23 '19 at 10:09