I'm working on a binary classification problem. I had this situation that I used the logistic regression and support vector machine model imported from sklearn. These two models were fit with the same , imbalanced training data and class weights were adjusted. And they have achieved comparable performances. When I used these two pre-trained models to predict a new dataset. The LR model and the SVM models predicted similar number of instances as positives. And the predicted instances share a big overlap.
However, when I looked at the probability scores of being classified as positives, the distribution by LR is from 0.5 to 1 while the SVM starts from around 0.1. I called the function model.predict(prediction_data)
to find out the instances predicted as each class and the function
model.predict_proba(prediction_data)
to give the probability scores of being classified as 0(neg) and 1(pos), and assume they all have a default threshold 0.5.
There is no error in my code and I have no idea why the SVM predicted instances with probability scores < 0.5 as positives as well. Any thoughts on how to interpret this situation?