Caret "Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined" when optimizing for ROC

Question

I'm trying to create a binary classifier, modelling with caret to optimize ROC. The method I was attempting was C5.0 and I get the following error and warning:

Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
  missing values found in aggregated results

I had modelled the same training data with C5.0 and caret earlier but optimizing for Accuracy and not using twoClassSummary in the control, and it ran without error.

My tuning grid and control for ROC run were

c50Grid <- expand.grid(.trials = c(1:9, (1:10)*10),
                       .model = c("tree", "rules"),
                       .winnow = c(TRUE, FALSE))

fitTwoClass <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 5,
  classProbs=TRUE,
  summaryFunction = twoClassSummary
  )

During Accuracy run, I omitted classProbs and summaryFunction portion of the control.

For the modeling, the command was

fitModel <- train(
  Unhappiness ~ .,
  data = dnumTrain,
  tuneGrid=c50Grid,
  method = "C5.0",
  trControl = fitTwoClass,
  tuneLength = 5,
  metric= "ROC"
  )

Can anyone advise how to troubleshoot this? Not sure what parameter to be tweaked if any to make this work, while I believe the dataset should be OK (since it ran OK when optimizing for Accuracy).

To reproduce, training set dnumTrain can be loaded from the file in this link.

Do you get error and warning when running the above example? — , Jul 29 '15 at 03:47
Yes, the error and warning are the ones at the top of the question — Ricky, Jul 29 '15 at 04:00
I am able to run (taking long time) it without neither warning nor error. — , Jul 29 '15 at 04:09
hm that is strange. I'm currently trying it with `ctree` instead of `C5.0`. It at first failed, then after I updated `caret` it worked, then afterwards it failed again. — Ricky, Jul 29 '15 at 04:23
I used `C50` version 0.1.0-24, `pROC` version 1.8 and `caret` version 6.0-52 on `R` version 3.2.1 Patched. `fitModel` says `ROC was used to select the optimal model using the largest value. The final values used for the model were trials = 30, model = rules and winnow = FALSE.` — , Jul 29 '15 at 05:13

score 2 · Accepted Answer · edited May 23 '17 at 12:16

I think I may have got this solved: after seeing in the comments that @Pascal was able to run the code without error, and realising I got a pretty random result running it with ctree, I investigated further areas that may have to do with randomness: random seed.

It seems the problem comes from me parallelising the process using doSNOW to 4 processors, and there is a need to set the seed for each iteration to avoid randomness creeping in (see answer to this question). I suspect random data causes some folds to have no valid values.

In any case I set the seeds as below:

CVfolds <- 5
CVreps <- 5
seedNum <- CVfolds * CVreps + 1
seedLen <- CVfolds + tuneLength
# create manual seeds vector for parallel processing repeatibility
set.seed(123)
seeds <- vector(mode = "list", length = seedNum)
for(i in 1:(seedNum-1)) seeds[[i]] <- sample.int(1000, seedLen)  
## For the last model:
seeds[[seedNum]] <- sample.int(1000, 1)

fitTwoClass <- trainControl(
  method = "repeatedcv",
  number = CVfolds,
  repeats = CVreps,
  classProbs=TRUE,
  summaryFunction = twoClassSummary,
  seeds = seeds
  )

So far I have re-trained fitModel 3 times and no error/warning yet, so I hope this is indeed the answer to my problem.

You didn't mention you were working in parallel environment. — , Jul 29 '15 at 07:01
Indeed I didn't; it didn't even occur to me that it may be a factor. I thought I could reproduce more concisely just the models and parameters rather than my whole script (which is quite big). Thanks for helping set me on the right track with your feedback that it actually works elsewhere. — Ricky, Jul 29 '15 at 07:17
There is a `seeds` option in `trainControl` that will set the seeds in the workers. — topepo, Jul 29 '15 at 20:07

Caret "Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined" when optimizing for ROC

1 Answers1