Using metric ROC in caret train function in R

Question

I have an imbalanced data set with two classes therefore I thought I could use ROC as a metric instead of Accuracy to tune my model in R using caret package (I am trying different methods such as rpart, rf..etc). I thought we could extract probabilities and use ROC as a metric in decision tree type algorithms as well using caret. I illustrate my problem using a data set in caret below. There are three classes in this data but I redefined and created two classes for illustration purposes. I don't understand why the below code gives this error (I keep getting the same error when I change the method). I appreciate your help.

Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined

In addition: Warning messages:

In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.

In train.default(x, y, weights = w, ...) : missing values found in aggregated results'

library(caret)

data(iris)
iris$Species                                   <- as.character(iris$Species)
iris$Species[which(iris$Species=='virginica')] <- 'versicolor'
iris$Species                                   <- as.factor(iris$Species)
x                                              <- iris[, !(colnames(iris) == "Species")]
y                                              <- iris$Species

fitControl <- trainControl(method = "cv", number=5, classProbs = TRUE, 
                           summaryFunction = twoClassSummary)

RF <- train(y = y, x=x,
            method="rpart",metric="ROC",
            trControl=fitControl)

I've seen these errors often when using `caret`. They are often not very descriptive of what the actual underlying problem is. I will see if I can reproduce it in this case. — Hack-R, May 27 '15 at 17:48
So far I am able to reproduce the problem, but not to fix it. I wonder if the size of the data is too small. Perhaps @Vlo could share his results and session information for comparison? — Hack-R, May 27 '15 at 18:03
I tried again and it still gives error. @VIo, did you use the exact code I posted? It is interesting that you didn't have any problems. Do any of you have another example you can share where you used ROC as the metric to tune your model. Thanx — KTY, May 27 '15 at 22:01
@KTY @Hack-R dput(RF) is way too large to post even for an answer. I ran the code verbatim with no changes. `R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit)` `[1] rpart_4.1-8 caret_6.0-41 ggplot2_1.0.0 lattice_0.20-29 reshape2_1.4.1 pracma_1.8.3 [7] foreach_1.4.2 dplyr_0.4.0` — Vlo, May 28 '15 at 17:17
@Vlo It's really hard to answer something so dataset-specific without the data. Is it possible to look at the datasets in `data()` and apply the logic of your function to one of those and see if you have the same problem, then post that? This would allow us to all work with the same dataset. — Hack-R, May 28 '15 at 20:36
@Hack-R I don't completely understand what you are suggesting. Using OP's code, I now only change `x` and `y` to another built-in dataset. `y = as.factor(ifelse(airquality$Month>6, "Boo", "Yay"));x = airquality[,-5]` The code still runs without error for me. — Vlo, May 28 '15 at 20:48
@Vlo Oh, I'm so sorry, I got confused. I thought you were OP when I wrote that. — Hack-R, May 29 '15 at 03:11
@Hack-R I don't think the problem I am facing is data specific as the same code works for VIo. I also tried the airquality data, I get the same error. Does my code work for you as well? Thanx. — KTY, May 29 '15 at 21:45
Based on this question being asked in multiple areas, I would say that the code you posted isn't generating the error (i.e. that's not exactly the code that you used). Most likely your outcome has the original three factors and ROC curves require a factor with two levels. Max — topepo, May 31 '15 at 16:35

Using metric ROC in caret train function in R

0 Answers0

Linked