2

My data is kind of irregular to apply ROC etc. for a threshold determination. To simplify, here is a demo, let x is

x<-c(0,0,0,12, 3, 4, 5, 15, 15.3, 20,18, 26)

Suppose x=15.1 is the unknown true threshold and the corresponding test outcome y will be negative (0) if x==0 OR x > 15.1, otherwise y is positive(1) such that:

y<-c(0,0,0,1, 1, 1, 1, 1, 0,0,0,0)

Due to 0 is a positive outcome in x, I'm wondering in which way I can determine the threshold of x to predict y the best. I have tried R packaged pROC and ROCR, both seem not straight forward for this situation. Would somebody have me some suggestions?

David Z
  • 6,641
  • 11
  • 50
  • 101

1 Answers1

3

You have a situation where you predict 0 for high values of x and predict 1 for low values of x, except you always predict 0 if x == 0. Standard packages like pROC and ROCR expect low values of x to be associated with predicting y=0. You could transform your data to this situation by:

  1. Flipping the sign of all your predictions
  2. Replacing 0 with a small negative value in x

In code (using this answer to extract TPR and FPR for each cutoff):

x2 <- -x
x2[x2 == 0] <- -1000
library(ROCR)
pred <- prediction(x2, y)
perf <- performance(pred, "tpr", "fpr")
data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]], 
           tpr=perf@y.values[[1]])
#        cut       fpr tpr
# 1      Inf 0.0000000 0.0
# 2     -3.0 0.0000000 0.2
# 3     -4.0 0.0000000 0.4
# 4     -5.0 0.0000000 0.6
# 5    -12.0 0.0000000 0.8
# 6    -15.0 0.0000000 1.0
# 7    -15.3 0.1428571 1.0
# 8    -18.0 0.2857143 1.0
# 9    -20.0 0.4285714 1.0
# 10   -26.0 0.5714286 1.0
# 11 -1000.0 1.0000000 1.0

Now you can select your favorite cutoff based on the true and false positive rates, remembering that the selected cutoff value will be negated from the original value.

Community
  • 1
  • 1
josliber
  • 43,891
  • 12
  • 98
  • 133