5

I'm tying to calculate the AUC using auc(roc(predictions, labels)), where labels is a numeric vector of 1 (x15) and 0 (x500), and predictions is a numeric vector with probabilities derived from a glm [binomial]. It should be very simple, but auc(roc(predictions, labels)) gives an error saying "Not enough distinct predictions to compute area under the ROC curve". I must be doing something silly, but I can't discover what. Can you?

The code is

library(AUC)
#read the data, that come from a previous process of a species distribution modelling
prob<-read.csv("prob.csv")
labels<-read.csv("labels.csv")
#prob is
#labels is

roc(prob,labels)

#Gives the error (that I'm NOT interest in)
Error in `[.data.frame`(predictions, pred.order) : undefined columns selected
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'

#I change the format to numeric vector
prob<-as.numeric(prob[,2])
labels<-as.numeric(labels[,2])
#Verify it is a vector numeric
class(prob)
[1] "numeric"
class(labels)
[1] "numeric"

#call the roc functoin
roc(prob,labels)

Error in roc(modbrapred, pbbra) : # THIS is the error I0m interested in
  Not enough distinct predictions to compute area under the ROC curve.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'    

Data is as follows

labels.csv
"","x"
"1",1
"2",1
"3",1
"4",1
"5",1
"6",1
...
"164",1
"165",1
"166",0
"167",0
"168",0
"169",0
"170",0
"171",0
"172",0 
...
"665",0

prob.csv
"","x"
"1",0.977465874525236
"2",0.989692657762578
"3",0.989692657762578
"4",0.988038430564019
"5",0.443188602491041
"6",0.409732585195485
...
"164",0.988607910625475
"165",0.986296936078692
"166",7.13529696560611e-05
"167",0.000419255989134081
"168",0.00295825183558019
"169",0.00182941235784709
"170",4.85601026999172e-09
"171",0.000953106471289961
"172",1.70252014430306e-05
...
"665",8.13413358866349e-08
user2942623
  • 559
  • 5
  • 8
  • 1
    Can you please add a reproducible example? – dayne Sep 01 '14 at 14:53
  • Please read [how to create a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You should edit your question to include that we can copy/paste into R to get the same error. Because about the `library()` calls required to get the code to run. You're right, it should be easy, so how exactly you've made it difficult is unclear. – MrFlick Sep 01 '14 at 16:17
  • thanks for the comments. I included now part of my real data – user2942623 Sep 02 '14 at 02:05
  • But not the code that generates the error, and you still didn't specify which library you are using. As it stands the question cannot be answered. Please re-read MrFlicks comment again until you fully understand it. – Calimo Sep 02 '14 at 06:41
  • thanks for the indication. I now uploaded code and data – user2942623 Sep 05 '14 at 11:40

1 Answers1

28

The problem was that my "labels" was a numeric vector, but I roc needed a factor. So I transformed

labels <- factor(labels)

and the roc worked as it should

Thanks for the time you dedicated

user2942623
  • 559
  • 5
  • 8