Obtaining threshold values from a ROC curve

Question

I have some models, using ROCR package on a vector of the predicted class percentages, I have a performance object. Plotting the performance object with the specifications "tpr", "fpr" gives me a ROC curve.

I'm comparing models at certain thresholds of false positive rate (x). I'm hoping to get the value of the true positive rate (y) out of the performance object. Even more, I would like to get the class percentage threshold that was used to generate that point.

the index number of the false positive rate (x-value) that is closest to the threshold without being above it, should give me the index number of the appropriate true positive rate (y-value). I'm not exactly sure how to get that index value.

And more to the point, how do i get the threshold of class probabilities that was used to make that point?

score 70 · Accepted Answer · answered May 02 '13 at 19:13

This is why str is my favorite R function:

library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)
> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
  ..@ x.name      : chr "False positive rate"
  ..@ y.name      : chr "True positive rate"
  ..@ alpha.name  : chr "Cutoff"
  ..@ x.values    :List of 1
  .. ..$ : num [1:201] 0 0 0 0 0.00935 ...
      ..@ y.values    :List of 1
      .. ..$ : num [1:201] 0 0.0108 0.0215 0.0323 0.0323 ...
  ..@ alpha.values:List of 1
  .. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...

Ahah! It's an S4 class, so we can use @ to access the slots. Here's how you make a data.frame:

cutoffs <- data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]], 
                      tpr=perf@y.values[[1]])
> head(cutoffs)
        cut         fpr        tpr
1       Inf 0.000000000 0.00000000
2 0.9910964 0.000000000 0.01075269
3 0.9846673 0.000000000 0.02150538
4 0.9845992 0.000000000 0.03225806
5 0.9834944 0.009345794 0.03225806
6 0.9706413 0.009345794 0.04301075

If you have an fpr threshold you want to hit, you can subset this data.frame to find maximum tpr below this fpr threshold:

cutoffs <- cutoffs[order(cutoffs$tpr, decreasing=TRUE),]
> head(subset(cutoffs, fpr < 0.2))
          cut       fpr       tpr
96  0.5014893 0.1495327 0.8494624
97  0.4997881 0.1588785 0.8494624
98  0.4965132 0.1682243 0.8494624
99  0.4925969 0.1775701 0.8494624
100 0.4917356 0.1869159 0.8494624
101 0.4901199 0.1962617 0.8494624

You're amazing. and thanks for mentioning str. I'll employ it should I be so stumped in the future. — Faydey, May 03 '13 at 01:03
I really like the interactive and iterative approach in this answer. — Clayton Stanley, May 04 '13 at 07:40
and if you want to automatically find the t-value giving the 50% cutoff: `cutoffs[findInterval(0.5, cu$tpr), 'cut']` as long as cutoffs is sorted in increasing order. — smci, Mar 08 '14 at 20:28
what if your cutoff isn't in the list? @smci answer for 50% cutoff outputs the values for `cutoff= 0.4997881` — PleaseHelp, Jun 17 '20 at 15:59
@PleaseHelp: sounds like a standard question about (polynomial) interpolation. Either see all the existing questions or ask a new question (you could link it here in comments). — smci, Jun 17 '20 at 21:09

Enrique Pérez Herrero · Answer 2 · 2023-03-12T19:37:11.817

23

Package pROC includes function coords for calculating the best threshold:

library(pROC)
my_roc <- roc(my_response, my_predictor)
coords(my_roc, "best", ret = "threshold")

edited Mar 12 '23 at 19:37

answered Oct 26 '16 at 18:53

Enrique Pérez Herrero

3,699
2
32
33

Artem Klevtsov · Answer 3 · 2017-03-16T02:34:08.583

2 solutions based on the ROCR and pROC packages:

threshold1 <- function(predict, response) {
    perf <- ROCR::performance(ROCR::prediction(predict, response), "sens", "spec")
    df <- data.frame(cut = perf@alpha.values[[1]], sens = perf@x.values[[1]], spec = perf@y.values[[1]])
    df[which.max(df$sens + df$spec), "cut"]
}
threshold2 <- function(predict, response) {
    r <- pROC::roc(response, predict)
    r$thresholds[which.max(r$sensitivities + r$specificities)]
}
data(ROCR.simple, package = "ROCR")
threshold1(ROCR.simple$predictions, ROCR.simple$labels)
#> [1] 0.5014893
threshold2(ROCR.simple$predictions, ROCR.simple$labels)
#> [1] 0.5006387

See also OptimalCutpoints package which provides many algorithms to find an optimal thresholds.

score 1 · Answer 4 · answered Aug 20 '22 at 00:23

Similar to @Artem's solution Basically the optimal threshold in a ROC curve is the widest part of the curve, or the point which gives maximum TPR while maintaining the lowest FPR FPR & TPR corresponding to best threshold - ROC curve

So one could also find the best threshold by finding the widest point or the point with maximum separation between TPR and FPR

Below is a quick solution using package ROSE

library(ROSE)
library(data.table)
threshold_data<-roc.curve(df$response,my_predictor,plotit = TRUE)
#Get TPR, FPR and corresponding threshold from roc.curve function and convert to dataframe 
threshold_data<-data.frame(TPR = threshold_data$false.positive.rate,
                       FPR = threshold_data$true.positive.rate,
                       threshold = threshold_data$thresholds)


# TPR       FPR  threshold       sep
# 1.0000000000 1.0000000       -Inf 0.0000000
# 0.7474009553 0.9820701 0.03405027 0.2346691
# 0.5869626300 0.9478403 0.08923265 0.3608776
# 0.4003933689 0.8777506 0.17368989 0.4773572
# 0.2225344198 0.7571312 0.25101859 0.5345968
# 0.1441416128 0.6495518 0.33035935 0.5054101
# 0.0868221411 0.5281174 0.44915920 0.4412952
# 0.0261309357 0.3390383 0.57857430 0.3129074
# 0.0089912897 0.2257539 0.76554635 0.2167626
# 0.0008429334 0.1140994 0.93730006 0.1132565
# 0.0000000000 0.0000000        Inf 0.0000000

threshold_data<-setDT(threshold_data)
threshold_data[,sep:=abs(FPR-TPR)]
best_threshold<-threshold_data[sep==max(sep),threshold]
#0.2510185

#Same result with package pROC
library(pROC)
my_curve <- roc(df$my_response,my_predictor)
coords(my_curve, "best", ret = "threshold")
#0.2510185

Adosey · Answer 5 · 2022-12-07T10:11:26.390

0

Following from Juilee Answer:

The definition of TPR and FPR in the dataframe are erroneous. Correcting them and re-posting the same answer as it is.

       threshold_data<-data.frame(FPR = 
                           threshold_data$false.positive.rate,
                   TPR = threshold_data$true.positive.rate, 
                   threshold = threshold_data$thresholds)

edited Dec 07 '22 at 10:11

answered Dec 07 '22 at 10:10

Adosey

1
1

Obtaining threshold values from a ROC curve

5 Answers5

Linked

Related