1

I'm working on a binary classification problem. I had this situation that I used the logistic regression and support vector machine model imported from sklearn. These two models were fit with the same , imbalanced training data and class weights were adjusted. And they have achieved comparable performances. When I used these two pre-trained models to predict a new dataset. The LR model and the SVM models predicted similar number of instances as positives. And the predicted instances share a big overlap.

However, when I looked at the probability scores of being classified as positives, the distribution by LR is from 0.5 to 1 while the SVM starts from around 0.1. I called the function model.predict(prediction_data) to find out the instances predicted as each class and the function model.predict_proba(prediction_data) to give the probability scores of being classified as 0(neg) and 1(pos), and assume they all have a default threshold 0.5.

There is no error in my code and I have no idea why the SVM predicted instances with probability scores < 0.5 as positives as well. Any thoughts on how to interpret this situation?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Anqi
  • 125
  • 1
  • 1
  • 8

1 Answers1

2

That's a known fact in sklearn when it comes to binary classification problems with SVC(), which is reported, for instance, in these github issues (here and here). Moreover, it is also reported in the User guide where it is said that:

In addition, the probability estimates may be inconsistent with the scores: the “argmax” of the scores may not be the argmax of the probabilities; in binary classification, a sample may be labeled by predict as belonging to the positive class even if the output of predict_proba is less than 0.5; and similarly, it could be labeled as negative even if the output of predict_proba is more than 0.5.

or directly within libsvm faq, where it is said that

Let's just consider two-class classification here. After probability information is obtained in training, we do not have prob > = 0.5 if and only if decision value >= 0.

All in all, the point is that:

  • on one side, predictions are based on decision_function values: if the decision value computed on a new instance is positive, the predicted class is the positive class and viceversa.

  • on the other side, as stated within one of the github issues, np.argmax(self.predict_proba(X), axis=1) != self.predict(X) which is where the inconsistency comes from. In other terms, in order to always have consistency on binary classification problems you would need a classifier whose predictions are based on the output of predict_proba() (which is btw what you'll get when considering calibrators), like so:

     def predict(self, X):
         y_proba = self.predict_proba(X)
         return np.argmax(y_proba, axis=1)
    

I'd also suggest this post on the topic.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
amiola
  • 2,593
  • 1
  • 11
  • 25
  • 1
    Thanks for the reply. I'll check it out to use some calibration methods to calibrate these classifiers. – Anqi Dec 02 '21 at 18:55