Scikit learn SVC predict probability doesn't work as expected

Question

I built sentiment analyzer using SVM classifier. I trained model with probability=True and it can give me probability. But when I pickled my model and load it again later, the probability doesn't work anymore.

The model:

from sklearn.svm import SVC, LinearSVC
pipeline_svm = Pipeline([
    ('bow', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('classifier', SVC(probability=True)),])

# pipeline parameters to automatically explore and tune
param_svm = [
  {'classifier__C': [1, 10, 100, 1000], 'classifier__kernel': ['linear']},
  {'classifier__C': [1, 10, 100, 1000], 'classifier__gamma': [0.001, 0.0001], 'classifier__kernel': ['rbf']},
]

grid_svm = GridSearchCV(
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)

svm_detector_reloaded = cPickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict([""""Today is awesome day"""])[0])

Gives me:

AttributeError: predict_proba is not available when probability=False

Can you show the code where you originally save the object to `''svm_sentiment_analyzer.pkl''?` — Bert Kellerman, Nov 19 '18 at 07:27
did you try to call `predict_proba` rather than `predict` when getting that `AttributeError`? Otherwise this is a bit puzzling — Davide Fiocco, Dec 19 '18 at 00:32

score 16 · Answer 1 · edited Mar 13 '20 at 01:25

16

Use: SVM(probability=True)

or

grid_svm = GridSearchCV(
    probability=True
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)

edited Mar 13 '20 at 01:25

eastclintw00d

2,250
1
9
18

answered Mar 12 '20 at 23:03

Neofytos Neocleous

161
1
3

score 7 · Answer 2 · edited Dec 30 '20 at 10:08

7

Adding (probability=True) while initializing the classifier as someone above suggested, resolved my error:

clf = SVC(kernel='rbf', C=1e9, gamma=1e-07, probability=True).fit(xtrain,ytrain)

edited Dec 30 '20 at 10:08

Salma Elshahawy

1,112
2
11
21

answered Dec 29 '20 at 23:48

Dinesh Marimuthu

107
1
4

score 2 · Answer 3 · answered Nov 22 '19 at 16:15

You can use CallibratedClassifierCV for probability score output.

from sklearn.calibration import CalibratedClassifierCV

model_svc = LinearSVC()
model = CalibratedClassifierCV(model_svc) 
model.fit(X_train, y_train)

Save model using pickle.

import pickle
filename = 'linearSVC.sav'
pickle.dump(model, open(filename, 'wb'))

Load model using pickle.load.

model = pickle.load(open(filename, 'rb'))

Now start prediction.

pred_class = model.predict(pred)
probability = model.predict_proba(pred)

Davide Fiocco · Answer 4 · 2019-01-04T22:24:17.567

If that can help, pickling the model with with:

import pickle
pickle.dump(grid_svm, open('svm_sentiment_analyzer.pkl', 'wb'))

and loading the model and predicting with

svm_detector_reloaded = pickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict_proba(["Today is an awesome day"])[0])

returned me two probabilities fine, after working on your code to rerun it and training the model on a pandas sents DataFrame with

grid_svm.fit(sents.Sentence.values, sents.Positive.values)

Best practices (e.g. using joblib) on model serialization can be found at https://scikit-learn.org/stable/modules/model_persistence.html

score 1 · Answer 5 · answered Dec 16 '20 at 20:10

Use the predprobs function to calculate the scores or probabilities/scores as asked in the auc(y_true, y_score), the issue is because of y_score. you can convert it as shown in the following line of code

# Classifier - Algorithm - SVM
# fit the training dataset on the classifier
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto',probability=True)
SVM.fit(Train_X_Tfidf,Train_Y)
# predict the labels on validation dataset
predictions_SVM = SVM.predict(Test_X_Tfidf)
# Use accuracy_score function to get the accuracy
**print("SVM Accuracy Score -> ",accuracy_score(predictions_SVM, Test_Y))**

probs = SVM.**predict_proba**(Test_X_Tfidf)
preds = probs[:,1]
fpr, tpr, threshold = **roc_curve(Test_Y, preds)**
**print("SVM Area under curve -> ",auc(fpr, tpr))**

see the difference between the accuracy_score and the auc(), you need the scores of predictions.

Scikit learn SVC predict probability doesn't work as expected

5 Answers5