Questions tagged [auc]

The area under the ROC curve can be thought of as a single scalar representation of the ROC curve itself. The AUC of a classifier has the property of being equivalent to the probability that the classifier will rank a randomly chosen positive data point higher than a randomly chosen negative data point.

The area under the ROC curve can be thought of as a single scalar representation of the ROC curve itself. Since this value represents part of the area of a 1x1 square, the AUC is a value between 0.0 and 1.0. However, since a classifier should always perform better than random, the realistic domain of the AUC values should be 0.5 to 1.0. The AUC of a classifier has the property of being equivalent to the probability that the classifier will rank a randomly chosen positive data point higher than a randomly chosen negative data point [Fawcett, 2006]. It can be shown that the AUC is related to the Gini coefficient [Hand et al, 2001]. The AUC can be estimated using trapezoidal approximation by considering the interval between consecutive points [Hand et al, 2001]

Fawcett, Tom. 2006. “An Introduction to ROC Analysis.” Pattern Recognition Letters 27 (8) (June): 861–874. doi:10.1016/j.patrec.2005.10.010.

Hand, David J, and Robert J Till. 2001. “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning 45 (2) (January 1): 171–186. doi:10.1023/A:1010920819831.

535 questions
65
votes
4 answers

F1 Score vs ROC AUC

I have the below F1 and AUC scores for 2 different cases Model 1: Precision: 85.11 Recall: 99.04 F1: 91.55 AUC: 69.94 Model 2: Precision: 85.1 Recall: 98.73 F1: 91.41 AUC: 71.69 The main motive of my problem to predict the positive cases…
user3342643
  • 729
  • 1
  • 7
  • 7
52
votes
11 answers

Calculate AUC in R?

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English? Page 9 of "AUC: a Better Measure..." seems to require knowing the class…
Andrew
  • 1,619
  • 3
  • 19
  • 24
35
votes
3 answers

What is a threshold in a Precision-Recall curve?

I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible. Imagine I have a model to build that predicts the re-occurrence (yes…
28
votes
8 answers

Easy way of counting precision, recall and F1-score in R

I am using an rpart classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart function. But I also want to calculate precision, recall and F1 score. My question is - do…
Karel Bílek
  • 36,467
  • 31
  • 94
  • 149
26
votes
5 answers

roc_auc_score - Only one class present in y_true

I am doing a k-fold XV on an existing dataframe, and I need to get the AUC score. The problem is - sometimes the test data only contains 0s, and not 1s! I tried using this example, but with different numbers: import numpy as np from sklearn.metrics…
bloop
  • 421
  • 1
  • 5
  • 8
22
votes
4 answers

How to compute AUC with ROCR package

I have fitted a SVM model and created the ROC curve with ROCR package. How can I compute the Area Under the Curve (AUC)? set.seed(1) tune.out=tune(svm ,Negative~.-Positive, data=trainSparse, kernel…
mac gionny
  • 333
  • 1
  • 3
  • 8
18
votes
2 answers

Getting a low ROC AUC score but a high accuracy

Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset. I use pandas to select some columns: df = df[["MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", "ORIGIN", "DEST", "CRS_DEP_TIME", "ARR_DEL15"]] I fill in NaN values…
Jon
  • 2,644
  • 1
  • 22
  • 31
18
votes
4 answers

Reason of having high AUC and low accuracy in a balanced dataset

Given a balanced dataset (size of both classes are the same), fitting it into an SVM model I yield a high AUC value (~0.9) but a low accuracy (~0.5). I have totally no idea why would this happen, can anyone explain this case for me?
Jamin
  • 329
  • 1
  • 4
  • 10
18
votes
2 answers

plot.roc for multiclass.roc in pROC package?

I am trying to plot multiclass ROC curves but I have not found anything fruitful in the pROC package. Here's some start code: data(iris) library(randomForest) library(pROC) set.seed(1000) # 3-class in response variable rf = randomForest(Species~.,…
AngryPanda
  • 1,261
  • 2
  • 19
  • 42
15
votes
2 answers

sklearn roc_auc_score with multi_class=="ovr" should have None average available

I'm trying to compute the AUC score for a multiclass problem using the sklearn's roc_auc_score() function. I have prediction matrix of shape [n_samples,n_classes] and a ground truth vector of shape [n_samples], named np_pred and np_label…
Dario Mantegazza
  • 153
  • 1
  • 1
  • 5
15
votes
5 answers

How to plot multiple ROC curves in one plot with legend and AUC scores in python?

I am building 2 models. Model 1 modelgb = GradientBoostingClassifier() modelgb.fit(x_train,y_train) predsgb = modelgb.predict_proba(x_test)[:,1] metrics.roc_auc_score(y_test,predsgb, average='macro', sample_weight=None) Model 2 model =…
Learner_seeker
  • 544
  • 1
  • 4
  • 21
14
votes
1 answer

Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV: log_reg = LogisticRegression() parameter_grid = {'penalty' : ["l1", "l2"],'C':…
huda95x
  • 149
  • 1
  • 1
  • 5
10
votes
1 answer

Comparing AUC, log loss and accuracy scores between models

I have the following evaluation metrics on the test set, after running 6 models for a binary classification problem: accuracy logloss AUC 1 19% 0.45 0.54 2 67% 0.62 0.67 3 66% 0.63 0.68 4 67% 0.62 0.66 5 63%…
quant
  • 4,062
  • 5
  • 29
  • 70
10
votes
1 answer

High AUC but bad predictions with imbalanced data

I am trying to build a classifier with LightGBM on a very imbalanced dataset. Imbalance is in the ratio 97:3, i.e.: Class 0 0.970691 1 0.029309 Params I used and the code for training is as shown below. lgb_params = { …
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108
9
votes
1 answer

How to calculate AUC with tensorflow?

I've built a binary classifier using Tensorflow and now I would like to evaluate the classifier using AUC and accuracy. As far as accuracy is concerned, I can easily do like this: X = tf.placeholder('float', [None, n_input]) y =…
mickkk
  • 1,172
  • 2
  • 17
  • 38
1
2 3
35 36