3

I am using Rstudio, and trying to use roc from package pROC with boot for bootstrapping. I am following the code on this link. Code from that link uses another function with boot which works fine. But when I try roc, it gives error.

Below is my code: (In the output I am printing the dimensions of the sample to see how many times re-sampling is done. Here R=5, sampling is done 6 times and then error occurs).

library(boot)

roc_boot <- function(D, d) {
  E=D[d,]
  print(dim(E))
  return(roc(E$x,E$y))
}

x = round(runif(100))
y = runif(100)
D = data.frame(x, y)

b = boot(D, roc_boot, R=5)

Output:

[1] 100   2
[1] 100   2
[1] 100   2
[1] 100   2
[1] 100   2
[1] 100   2
Error in boot(D, roc_boot, R = 5) : 
  incorrect number of subscripts on matrix

What is the problem here?

If I replace roc with some other function like sum, then it works perfectly (it prints the 6 lines without any error). It also gives different answers when booted multiple times (while keeping D same).

Please notice that the error is occurring after all the re-sampling is done. I cannot find the source of this particular error. I have looked at other answers like this but they don't seem to apply on my case. Can someone also explain why this error occurs and what it means, generally?

EDIT: I returned only area under curve using following function:

roc_boot <- function(D, d) {
  E=D[d,]
  objectROC <- roc(E$x,E$y)
  return(objectROC$auc)
}

This gives an answer of area under the curve but it is same as the answer without bootstrapping, meaning there is no improvement. I need to pass the whole roc object to have improvement because of bootstrapping.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
dc95
  • 1,319
  • 1
  • 22
  • 44
  • Are you looking to return the roc object, or calculate the area under the roc? The error occurs because boot is looking for a single value to be returned from the statistic argument. –  Jun 28 '16 at 22:00
  • @JimM. I want to return the object. Is that possible? I have returned area under curve using `$auc` of the roc object. But in that case, the area is same as area without bootstrapping. I suspect that I will need to pass the `roc` object to have some improvement due to bootstrapping. I will edit the question to add this. – dc95 Jun 28 '16 at 22:30
  • As an improvement to your code, I would suggest increasing the number of bootstraps somewhere in the neighbourhood of 1000 depending on the variability in your data. Simply by bootstrapping the roc objects won't lead to improvement (higher auc?) than if you just return the auc statistic. If you look at your boot object `b`, it will give you the difference between the mean of the bootstraps and the auc calculated on the original data set, aka bias. –  Jun 29 '16 at 00:49
  • What is it you want to achieve? Bootstrapping gives you a measure of the uncertainty of a statistic. This is not defined directly on the ROC curve, this is why you need the AUC for instance. – Calimo Jun 29 '16 at 07:28
  • By the way the edited code seems to work fine for me, and gives the AUC with bias and std.error as expected. What do you mean by "there is no improvement"? Bootstrapping doesn't improve your statistic, it only reports on its uncertainty. – Calimo Jun 29 '16 at 07:30

1 Answers1

4

Turns out, you can't return roc object from the function statistic in boot. It has to be a numeric value. So the following modification gets rid of the error (as edited in the questions)

roc_boot <- function(D, d) {
  E=D[d,]
  objectROC <- roc(E$x,E$y)
  return(objectROC$auc)
}

Moreover, As suggested by @Calimo, boot only improves the confidence interval and not the actual answer. In my case, there is a slight improvement in confidence interval.

dc95
  • 1,319
  • 1
  • 22
  • 44