SMOTE Algorithm and Classification: overrated prediction success

Question

I'm facing a problem about which I can't find any answer. I have a binary classification problem (output Y=0 or Y=1) with Y=1 the minority class (actually Y=1 indicates default of a company, with proportion=0.02 in the original dataframe). Therefore, I computed oversampling using SMOTE algorithm on my training set only (after splitting my dataframe in training and testing sets). I train a logistic regression on my training set (with proportions of class "defaut"=0.3) and then look at the ROC Curve and MSE to test whether my algorithm predicts well the default. I get very good results in terms both of AUC (AUC=0.89) and MSE (MSE=0.06). However, when I then try to look more preciselly and individually at my predictions, I find that 20% of default aren't well predicted. Do you have a method to evaluate well the quality of my prediction (quality means for me predictions that predict well default). I thought that AUC was a good criterium... So far do you also have a method in order to improve my regression? Thanks

MSE is meaningless in classification settings – desertnaut Nov 06 '18 at 15:23 — desertnaut, Nov 06 '18 at 15:23

RLave · Accepted Answer · 2018-11-06T14:10:35.797

3

For every classification problem you can build a confusion matrix.

This is a two way entry matrix, and lets you see not only the true positives/true negatives (TP/TN), which are your correct predictions, but also the false positives (FP)/false negatives (FN), and this is most of the time your true interest.

FP and FN are the errors that your model make, you can track how well your model is doing in detecting either the TP (1-FP) or the TN (1-FN), by using sensitivity or specificity (link).

Note that you can't improve one without lowering the other. So sometimes you need to pick one.

A good compromise is the F1-score, which tries to average the two.

So if you're more interested in defaults (lets imagine that defaults=Positive Class), you'll prefer a model with a higher sensitivity. But remember to not neglect completely the specificity either.

Here an example code in R:

# to get the confusion matrix and some metrics
caret::confusionMatrix(iris$Species, sample(iris$Species))

edited Nov 06 '18 at 14:10

answered Nov 06 '18 at 14:03

RLave

8,144
3
21
37

Ok thanks I'll have a look at this! However the notions of sensitivity and specificity also count in AUC which is in my example very good. What's then the difference between looking at AUC and looking at confusion matrix? My question might be stupid I'm sorry but I do not really see the difference – T. Ciffréo Nov 06 '18 at 14:07
https://stats.stackexchange.com/questions/210700/how-to-choose-between-roc-auc-and-f1-score this may help you a little. I forgot to mention that AUC is a good average, sometimes just like f1-score – RLave Nov 06 '18 at 14:10
other good answers: https://stackoverflow.com/questions/44172162/f1-score-vs-roc-auc and https://stackoverflow.com/questions/34698161/how-to-interpret-almost-perfect-accuracy-and-auc-roc-but-zero-f1-score-precisio#34698935 – RLave Nov 06 '18 at 14:13
1

I recommend looking at the precision recall curve. Recall is the same as sensitivity but precision is how many of the positives that get predicted are actually true ie: TP/(TP+FP). On the other hand specificity which is used in AUC will get higher for all the true negatives that you predict. If you have imbalanced data that is mostly negative then true negatives are easy to predict and will inflate AUC. https://www.kaggle.com/general/7517 – see24 Nov 06 '18 at 14:52

SMOTE Algorithm and Classification: overrated prediction success

1 Answers1