I have a dataset I split into test/train datasets. Immediately following that split I produced a logistic model with:
logModel1 = glm(Y ~ . -var1 -var2 -var3, data=train, family=binomial)
If I use that model to make predictions on the same train set, I get no error (though of course a not-super-useful test of my model). So I used the code below to predict on my test set:
predictLog1 <- predict(logModel1, type="response", newdata=test)
But I get the following error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor myCharVar has new levels This is an observation of myCharVar, This is another...
Here's what's got me particularly confused:
- myCharVar is a character variable in both my train and test sets. I've confirmed this with
str(test$myCharVar)
andstr(train$myCharVar)
- My model does not even use myCharVar as part of the prediction.
I found an explanation for bullet 2 at this SO link: "Factor has new levels" error for variable I'm not using
And the suggestion there to remove the character variables altogether from my train and test sets has provided me a workaround so at least I'm not held up. But that seems pretty inelegant, as opposed to just removing them from the model with "-myCharVar". If anyone understands why a character variable in my test set would throw a "factor has new levels" error I'd certainly be interested.