-1

I am working on a classification-based project, and I am evaluating different ML models based on their training accuracy, testing accuracy, confusion matrix, and the AUC score. I am now stuck in understanding the difference between the scores I get by calculating accuracy of a ML model on the test set (X_test), and the AUC score.

If I am correct, both metrics calculate how well a ML model is able to predict the correct class of previously unseen data. I also understand that for both, the higher the number, the better, for as long as the model is not over-fit or under-fit.

Assuming a ML model is neither over-fit nor under-fit, what is the difference between test accuracy score and the AUC score?

I don't have a background in math and stats, and pivoted towards data science from business background. Therefore, I will appreciate an explanation a business person can understand.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Arsik36
  • 277
  • 3
  • 16
  • I recommend reading [this answer](https://datascience.stackexchange.com/a/807), it's really well written and should be able to clear it up for you! – hmhmmm Mar 28 '20 at 19:01

1 Answers1

3

Both terms quantify the quality of a classification model, however, the accuracy quantifies a single manifestation of the variables, which means it describes a single confusion matrix. The AUC (area under the curve) represents the trade-off between the true-positive-rate (tpr) and the false-positive-rate (fpr) in multiple confusion matrices, that are generated for different fpr values for the same classifier. A confusion matrix is of the form:

enter image description here

1) The accuracy is a measure for a single confusion matrix and is defined as: accuracy =  (TP+TN)/(TP+FP+TN+FN)

where tp=true-positives, tn=true-negatives, fp=false-positives and fn=false-negatives (the amount of each).

2) The AUC measures the area under the ROC (receiver operating characteristic), that is the trade-off curve between the true-positive-rate and the false-positive-rate. For each choice of the false-positive-rate (fpr) threshold,the true-positive-rate (tpr) is determined. I.e, for a given classifier a fpr of 0, 0.1, 0.2 and so fourth is accepted, and for each fpr it's dependent tpr is evaluated. Therefore, you get a function tpr(fpr) that maps the interval [0,1] onto the same interval, because both rates are defined in those intervals. The area under this line is called the AUC, that is between 0 and 1, whereby a random classification is expected to yield an AUC of 0.5.

enter image description here enter image description here

The AUC, as it is the area under the curve, is defined as:

enter image description here

However, in real (and finite) applications, the ROC is a step function and the AUC is determined by a weighted sum these levels.

Graphics are from Borgelt's Intelligent Data Mining Lecture.

  • Thank you, this is very helpful! One follow-up question; when I am evaluating the performance of classification algorithms, are you saying that the AUC score is a better of a metric to rely on than the accuracy score? My slight confusion is that some of my models differ in AUC and test accuracy scores. In this case, I wonder if I should prioritize the model that has a higher test accuracy or AUC score. Thank you! – Arsik36 Mar 28 '20 at 20:50
  • 1
    @Arsik36 it's not that simple; in order to get the accuracy you need to define a specific *threshold* (see [Predict classes or class probabilities?](https://stackoverflow.com/questions/51367755/predict-classes-or-class-probabilities)), while AUC measure the performance across a whole range of such thresholds. See [here](https://stackoverflow.com/questions/47104129/getting-a-low-roc-auc-score-but-a-high-accuracy/47111246#47111246) and [here](https://stackoverflow.com/questions/51190809/high-auc-but-bad-predictions-with-imbalanced-data/51192702#51192702) for more details. – desertnaut Mar 28 '20 at 21:25