4

I'm using caret to train a gbm model in R. I've used the formula interface to exclude certain variables from my model:

gbmTune <- train(Outcome ~ . - VarA - VarB - VarC, data = train,
    method = "gbm",
    metric = "ROC",
    tuneGrid = gbmGrid,
    trControl = cvCtrl,
    verbose = FALSE)

When I try to use predict() against my test set, R complains about new factor levels for a variable I've asked to be excluded. The only solution I've been able to come up with is to set those variables to NULL before training my model...remove them. That doesn't seem like the answer.

I'm fairly new at this, so I would love to know what I'm doing wrong!

milos.ai
  • 3,882
  • 7
  • 31
  • 33
rsarac
  • 41
  • 1
  • Are there new levels of the variable in the testing set? If that's the case, then you should either remove those points, ignore the variable, or create an "other" category for any such variables, and find a criteria on how to assign it (e.g., if that level appears less than 50 times in a data set). – Max Candocia Apr 20 '15 at 20:28
  • 1
    Can I ask why it's even looking at those variables if I've asked the train function to exclude them with the "-" operator? – rsarac Apr 20 '15 at 20:31
  • 1
    I just tried running a train function on my own data set, and it appeared to fail to remove the variables that I requested. I'm guessing it doesn't parse that part of the formula correctly. If someone has a better explanation, please chime in. – Max Candocia Apr 20 '15 at 21:12
  • It would be nice to have a fully [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Apr 20 '15 at 21:47
  • Without the full error message and str(), there is no basis for answering. Would also be good to name the package with the train fn. – IRTFM Apr 21 '15 at 01:26
  • Thank you all for your help and feedback. It will help me become a better member of stackoverflow. :) I'll put together a reproducible example and be back! – rsarac Apr 22 '15 at 20:02
  • Well, I put together an example, but it worked as I expected. When I use "-" to exclude a variable from my model, it's excluded. And yet, I was running into a problem with this before. I'll have to get a better grip on what I'm dealing with. My apologies for the poorly formed question / post. I really appreciate your help and advice this far, everyone. – rsarac Apr 22 '15 at 20:15

0 Answers0