-1

I am using SVM from sklearn (Python 3). The predicted class probability is lesser than the highest probability among all classes. Can somebody explain this.

    clf = Pipeline([('vect', TfidfVectorizer()), ('clf', svm.SVC())])
    parameters = {'vect__ngram_range': [(1, 2)], 'vect__stop_words': ['english'],
                  'vect__lowercase': [True], 'clf__C': [1,2, 5, 10, 20, 100],
                  'clf__kernel': [str('linear')], 'clf__class_weight':['balanced'],
                  'clf__probability': [True]}
    vec_clf = GridSearchCV(clf, parameters, scoring='f1_weighted')
    vec_clf.fit(x_train, y_train)

Print statements.

    pred_data = model.predict(input_series)
    probability_lst = model.predict_proba(input_series)[0]
    print ("probability lst: ", probability_lst)
    print ("predicted data: ", pred_data)
    print ("classes: ", model.best_estimator_.classes_)

This is the code I am using. Please find the below print output.

    probability lst:  [ 0.29004279  0.38866277  0.04441053  0.1173824   0.0300703   0.0983329   0.03109831]
    predicted data:  ['1']
    classes:  ['1' '2' '3' '4' '5' '6' '7']

Logically it should predict class "2" as it has highest probability. Please explain this output

user2550098
  • 163
  • 1
  • 13
  • Why are you accessing `model.predict_proba(input_series)[0]`, emphasis on the `[0]`? – erip Nov 08 '17 at 11:38
  • 3
    Possible duplicate of [Confusing probabilities of the predict\_proba of scikit-learn's svm](https://stackoverflow.com/questions/30674164/confusing-probabilities-of-the-predict-proba-of-scikit-learns-svm) – Vivek Kumar Nov 08 '17 at 11:42
  • it was a list of list, so I took [0] – user2550098 Nov 08 '17 at 13:02
  • @VivekKumar As mentioned in the other link shared by you, I tried decision function as well. but could not understand the output. Here is the output. – user2550098 Nov 08 '17 at 13:53
  • dec fun: [[ 4.24585270e-02 4.96189478e-01 2.49730266e-01 5.74115210e-01 2.91674509e-01 5.43911786e-01 6.57224118e-01 2.65190816e-01 7.28700141e-01 3.96157293e-01 1.10457723e+00 -2.27747710e-01 1.64406656e-01 -2.79911772e-01 8.23904997e-02 3.42637709e-01 -9.38947411e-05 3.42586750e-01 -4.11551164e-01 -8.49533033e-02 3.48155592e-01]] – user2550098 Nov 08 '17 at 13:53
  • I have 7 classes, and this output is around 21. How can I get the class name with highest score – user2550098 Nov 08 '17 at 13:59
  • This is because SVC does not directly support multiclass classification, what you're seeing is a One vs One classification output. The class name with the "highest score" will be the one output by `predict`. – piman314 Nov 09 '17 at 09:43

1 Answers1

0

I have read some docs on predict, I think it works in different way not synced with predict_proba.

So using predic_proba and taking the highest probability and its corresponding class(from model.classes_)

We can close this ticket.

user2550098
  • 163
  • 1
  • 13