In Classification, what is the difference between the test accuracy and the AUC score?

Question

I am working on a classification-based project, and I am evaluating different ML models based on their training accuracy, testing accuracy, confusion matrix, and the AUC score. I am now stuck in understanding the difference between the scores I get by calculating accuracy of a ML model on the test set (X_test), and the AUC score.

If I am correct, both metrics calculate how well a ML model is able to predict the correct class of previously unseen data. I also understand that for both, the higher the number, the better, for as long as the model is not over-fit or under-fit.

Assuming a ML model is neither over-fit nor under-fit, what is the difference between test accuracy score and the AUC score?

I don't have a background in math and stats, and pivoted towards data science from business background. Therefore, I will appreciate an explanation a business person can understand.

I recommend reading [this answer](https://datascience.stackexchange.com/a/807), it's really well written and should be able to clear it up for you! — hmhmmm, Mar 28 '20 at 19:01

Christoph Schranz · Accepted Answer · 2020-03-29T17:00:53.420

Both terms quantify the quality of a classification model, however, the accuracy quantifies a single manifestation of the variables, which means it describes a single confusion matrix. The AUC (area under the curve) represents the trade-off between the true-positive-rate (tpr) and the false-positive-rate (fpr) in multiple confusion matrices, that are generated for different fpr values for the same classifier. A confusion matrix is of the form:

1) The accuracy is a measure for a single confusion matrix and is defined as:

where tp=true-positives, tn=true-negatives, fp=false-positives and fn=false-negatives (the amount of each).

2) The AUC measures the area under the ROC (receiver operating characteristic), that is the trade-off curve between the true-positive-rate and the false-positive-rate. For each choice of the false-positive-rate (fpr) threshold,the true-positive-rate (tpr) is determined. I.e, for a given classifier a fpr of 0, 0.1, 0.2 and so fourth is accepted, and for each fpr it's dependent tpr is evaluated. Therefore, you get a function tpr(fpr) that maps the interval [0,1] onto the same interval, because both rates are defined in those intervals. The area under this line is called the AUC, that is between 0 and 1, whereby a random classification is expected to yield an AUC of 0.5.

The AUC, as it is the area under the curve, is defined as:

However, in real (and finite) applications, the ROC is a step function and the AUC is determined by a weighted sum these levels.

Graphics are from Borgelt's Intelligent Data Mining Lecture.

Thank you, this is very helpful! One follow-up question; when I am evaluating the performance of classification algorithms, are you saying that the AUC score is a better of a metric to rely on than the accuracy score? My slight confusion is that some of my models differ in AUC and test accuracy scores. In this case, I wonder if I should prioritize the model that has a higher test accuracy or AUC score. Thank you! — Arsik36, Mar 28 '20 at 20:50
@Arsik36 it's not that simple; in order to get the accuracy you need to define a specific *threshold* (see [Predict classes or class probabilities?](https://stackoverflow.com/questions/51367755/predict-classes-or-class-probabilities)), while AUC measure the performance across a whole range of such thresholds. See [here](https://stackoverflow.com/questions/47104129/getting-a-low-roc-auc-score-but-a-high-accuracy/47111246#47111246) and [here](https://stackoverflow.com/questions/51190809/high-auc-but-bad-predictions-with-imbalanced-data/51192702#51192702) for more details. — desertnaut, Mar 28 '20 at 21:25

In Classification, what is the difference between the test accuracy and the AUC score?

1 Answers1

Linked