0

I am using SVC (linear vs rbf) to perform classification on my unbalanced dataset. So, I used the class_weight='balanced'.

As noted previously, using class_weight provides probabilities that do not match the prediction from predict() (SVM model predicts instances with probability scores greater than 0.1(default threshold 0.5) as positives). This may occur and is stated in sklearn documentation because predict() uses the decision_function() to compute predictions and not predict_proba(). Nevertheless, one can see that if using a different threshold than 0.5, the probabilities from predict_proba() match the predict().

To understand how this inconsistency influences the results, I calculated the ROC-AUC using both predict_proba() probabilities and decision_function(), and I end up with the same result (ROC_AUC = 0.711).

Nevertheless, when I compute the optimal threshold based on Youden’s J statistic, I obtain different values: 0.09 for decision_function() and 0.64 using predict_proba()for computing ROC-AUC. EDIT: These differences make sense because the threshold is for scores and probabilities, respectively.

Understand how sklearn computes probabilities when using class_weight is useful, because it allows to understand if when comparing models using DeLong, we are using the right probabilities. In my case, shift probabilities and classifier calibration did not work.

Any thoughts about this?

Hrpereira
  • 21
  • 5

0 Answers0