34

I would like to get a confidence score of each of the predictions that it makes, showing on how sure the classifier is on its prediction that it is correct.

I want something like this:

How sure is the classifier on its prediction?

Class 1: 81% that this is class 1
Class 2: 10%
Class 3: 6%
Class 4: 3%

Samples of my code:

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)

# Determine amount of time to train
t0 = time()
model = SVC()
#model = SVC(kernel='poly')
#model = GaussianNB()

model.fit(features_train, labels_train)

print 'training time: ', round(time()-t0, 3), 's'

# Determine amount of time to predict
t1 = time()
pred = model.predict(features_test)

print 'predicting time: ', round(time()-t1, 3), 's'

accuracy = accuracy_score(labels_test, pred)

print 'Confusion Matrix: '
print confusion_matrix(labels_test, pred)

# Accuracy in the 0.9333, 9.6667, 1.0 range
print accuracy



model.predict(sub_main)

# Determine amount of time to predict
t1 = time()
pred = model.predict(sub_main)

print 'predicting time: ', round(time()-t1, 3), 's'

print ''
print 'Prediction: '
print pred

I suspect that I would use the score() function, but I seem to keep implementing it correctly. I don't know if that's the right function or not, but how would one get the confidence percentage of a classifier's prediction?

user3377126
  • 2,091
  • 4
  • 32
  • 39
  • 1
    really helpful question. is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to? – AbtPst Dec 18 '15 at 15:28
  • 1
    the probabilities correspond to `classifier.classes_`. But they are non-sense if the dataset is small :-( . Moreover, they are also not guaranteed to match up with `classifier.predict()` :'( . [link to docs page](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.predict) – AneesAhmed777 Jun 23 '17 at 16:43

3 Answers3

33

Per the SVC documentation, it looks like you need to change how you construct the SVC:

model = SVC(probability=True)

and then use the predict_proba method:

class_probabilities = model.predict_proba(sub_main)
Justin Peel
  • 46,722
  • 6
  • 58
  • 80
  • 2
    Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean? – user3377126 Jun 30 '15 at 15:57
  • 1
    @user3377126 How did you interpreted the values – manish Prasad Mar 12 '19 at 12:05
  • Is the probability same as confidence? While `predict_proba` returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determined – The Great Jan 17 '22 at 13:53
  • If you have time, can help with this related question. - https://stats.stackexchange.com/questions/560774/likelihood-vs-confidence-in-layman-terms – The Great Jan 17 '22 at 13:55
16

For those estimators implementing predict_proba() method, like Justin Peel suggested, You can just use predict_proba() to produce probability on your prediction.

For those estimators which do not implement predict_proba() method, you can construct confidence interval by yourself using bootstrap concept (repeatedly calculate your point estimates in many sub-samples).

Let me know if you need any detailed examples to demonstrate either of these two cases.

Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
  • Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean? – user3377126 Jun 30 '15 at 15:57
  • 5
    @user3377126 They are already in percentage form. :) The sum of each row should equal exactly to 1. The last element is actually 0.992 which means the algo predict it belongs to this class with prob 99.2%. Note `e-03` is just scientific notation. – Jianxun Li Jun 30 '15 at 16:00
  • Ah I see now, thank you! :) I would have accepted your answer, but since Justin Peel commented first with the example that worked for me, I decided to give it to him, sorry about that but thanks for the advice! – user3377126 Jun 30 '15 at 17:36
  • 1
    No problem at all. :) Glad that we both could help. – Jianxun Li Jun 30 '15 at 17:37
  • 1
    is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to? – AbtPst Dec 18 '15 at 15:28
  • @JianxunLi - Could you please elaborate on the second case where the predict_proba() method is not provided – Dreams Jan 28 '19 at 07:40
  • Is the probability same as confidence? While `predict_proba` returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determined? – The Great Jan 17 '22 at 13:53
0

using above code you will get 4 class names with predicted value for each sample. You can change no_of_class for as many as you need.

probas1 =model.predict_proba(sub_main)
no_of_class=4

top3_classes1 = np.argsort(-probas1, axis=1)[:, :no_of_class]

class_labels1 = rf.classes_[top3_classes1[i]] for i in range(len(top3_classes1))]

class_labels1

top_confidence1=[probas1[i][top3_classes1[i]] for i in range(len(top_classes1))]

for i in range(len(class_labels1)):

    for j in range(no_of_class):

        print(f"Sample {i}: {class_labels1[i][j]} :: {top_confidence1[i][j]}")

NOTE: you can simply also convert this into dataframe where you can add column of predicted class and in another column its predicted value

Moritz Ringler
  • 9,772
  • 9
  • 21
  • 34