0

I want to evaluate a logistic regression model (binary event) using two measures: 1. model.score and confusion matrix which give me a 81% of classification accuracy 2. ROC Curve (using AUC) which gives back a 50% value

Are these two result in contradiction? Is that possible I'missing something but still can't find it

 y_pred = log_model.predict(X_test)
 accuracy_score(y_test , y_pred)


 cm = confusion_matrix( y_test,y_pred  )
 y_test.count()
 print (cm)

 tpr , fpr, _= roc_curve( y_test , y_pred, drop_intermediate=False)
 roc = roc_auc_score( y_test ,y_pred)

enter image description here enter image description here

Parsifal
  • 340
  • 6
  • 17
  • 2
    I think this might be a question better suited for [Cross Validated Stack](https://stats.stackexchange.com) – Matthew Barlowe May 12 '19 at 14:46
  • 2
    Also this is a good [answer](https://stackoverflow.com/questions/47104129/getting-a-low-roc-auc-score-but-a-high-accuracy) – Matthew Barlowe May 12 '19 at 14:48
  • In addition to Matthew Barlowe's very relevant comments, I assume you are aware that you have a "little bit" of class imbalance? – Calimo May 12 '19 at 15:10
  • Possible duplicate of [Getting a low ROC AUC score but a high accuracy](https://stackoverflow.com/questions/47104129/getting-a-low-roc-auc-score-but-a-high-accuracy) – Calimo May 12 '19 at 15:10
  • @Calimo yes, i would like to better understand which indicator is best suited for my case. – Parsifal May 12 '19 at 22:35
  • @Calimo yess it was preatty unbalanced, what sould i do in this case? Any suggestions? – Parsifal May 14 '19 at 17:29
  • @lucapellerossapelles you've got several suggestions above. I don't think you can get anything more here, as you don't have any specific programming question. – Calimo May 14 '19 at 18:30

1 Answers1

1

The accuracy score is calculated based on the assumption that a class is selected if it has a prediction probability of more than 50%. This means that you are looking only at 1 case (one working point) out of many. Let's say you'd like to classify an instance as '0' even if it has a probability greater than 30% (this may happen if one of your classes is more important for you, and its a-priori probability is very low). In this case - you will have a very different confusion matrix with a different accuracy ([TP+TN]/[ALL]). The ROC auc score examines all of these working points and gives you an estimation of your overall model. A score of 50% means that the model is equal to a random selection of classes based on your a-priori probabilities of the classes. You would like the ROC to be much higher to say that you have a good model. So in the above case - you can say that your model does not have a good prediction strength. As a matter of fact - a better prediction will be to predict everything as "1" - in your case it will lead to an accuracy of above 99%.

Roee Anuar
  • 3,071
  • 1
  • 19
  • 33