0

I have built an xgboost model using caret package. I saved this model and intended to use it for prediction for a new data I will have. Then when I tried to predict on a new dataset it turned out that among the new data there were no observations with value=unfinished for variable=condition_type while in a training set this value exists.

That led me to the following error:

Error in eval(predvars, data, env) : 
  object 'condition_type.unfinished' not found

Does anyone have the same problem and could guide me what to do in this situation?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
kskirpic
  • 155
  • 1
  • 1
  • 7
  • Please add sample data with `dput` and enough code to reproduce the issue. See [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for details. – NelsonGon May 11 '20 at 17:36
  • You could code this unseem categories to be equal to "other", When I have some data with multiple classifications, I usually reduce them to the n main ones and the rest goes into "others" this way your model can actually predict new categories in production, another alternative is to code new categories as NA, if you need a technical explanation maybe you can ask this question on cross validated, you can't predict with categorical values never seen before on a model, so you have to decide what this new data looks like or refuse to predict it altogether – Bruno May 11 '20 at 17:38
  • Hi, Bruno, thank you for answering. In my case the training dataset had more values for variable condition_type, it had a value 'unfinished'. The new dataset on which I need to make prediction happened to not have this value 'unfinished' that is why I see the error – kskirpic May 12 '20 at 16:23

0 Answers0