13

On numerous occasions I've been getting this error when trying to fit a gbm or rpart model. Finally I was able to reproduce it consistently using publicly available data. I have noticed that this error happens when using CV (or repeated cv). When I don't use any fit control I don't get this error. Can some shed some light one why I keep getting error consistently.

fitControl= trainControl("repeatedcv", repeats=5)
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
ds$sub = as.factor(ds$substance)
rpartFit1 <- train(homeless ~ female + i1 + sub + sexrisk + mcs + pcs, 
                   tcControl=fitControl, 
                   method = "rpart", 
                   data=ds)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Fred R.
  • 557
  • 3
  • 7
  • 16
  • 5
    In my experience when this error happened it was because some variable were factors and not numeric. Another case might be where of the variable is a character strings. Try a `sapply(your_data, class)` to check the column classes. – SabDeM Jul 28 '15 at 19:51
  • Thanks for your reply. This dataset and others that exhibits this error have some vars that are of factor class. But why does this matter? can rpart not handle factor variables. Why does it not work when using CV? – Fred R. Jul 28 '15 at 19:55
  • if numbers try to convert them to `numeric`, it `characters` try to do not include them in the model. Anyway it is not a general error, I think that it depends on what `method` train uses. Do not forget that `train` is not a model but just a wrapper to simply the syntax and apply a ton of different model just by changing the `method` argument. – SabDeM Jul 28 '15 at 19:57
  • is your data being split in the same way each time? I imagine that would lead to this sort of error – Rorschach Jul 28 '15 at 20:05
  • 1
    This particular error seems to go away if you use the correct parameter name in the `train()` function. It should be `trControl=fitControl`, not `tcControl=fitControl`. This was obvious after looking at the `warnings()` generated. – MrFlick Jul 28 '15 at 21:00
  • @SabDeM You just saved me my sanity! Thank you! – pookie Jun 20 '16 at 15:12
  • 1
    Hi i've posted an answer but it was deleted. I was saying that the error of computing RMSE probably comes from having infinite values in the training dataset. Tell me if this could be correct. – agenis Dec 05 '17 at 11:14
  • Remove every thing from train() expect formula,data & method and try it again...!! – Aniket Sawale Mar 28 '18 at 08:48

1 Answers1

1

There is a typo, it should be trControl instead of tcControl. And when the argument is provided as tcControl, caret passes this to rpart and this throws an error because this option was never available.

I guess this answers your question of why you get this error when you try to have a cross-validation in training.

Below is how it should work:

library(caret)
library(mosaicData)

data(HELPrct)
ds = HELPrct
fitControl= trainControl(method="repeatedcv",times=5)
ds$sub = as.factor(ds$substance)

rpartFit1 <- train(homeless ~ female + i1 + sub + sexrisk + mcs + pcs, 
                   trControl=fitControl, 
                   method = "rpart", 
                   data=ds[complete.cases(ds),])

rpartFit1
CART 

117 samples
  6 predictor
  2 classes: 'homeless', 'housed' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 105, 105, 105, 106, 105, 106, ... 
Resampling results across tuning parameters:

  cp          Accuracy   Kappa      
  0.00000000  0.5280303  -0.03503032
  0.01190476  0.5280303  -0.03503032
  0.07142857  0.5977273  -0.02970604

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.07142857.
StupidWolf
  • 45,075
  • 17
  • 40
  • 72