I have split my data set into testing and training data sets. I've tried to fit a regression on the training set, and then use predict on the testing set. When I do this I get an error message that says: "Error in model.frame factor x has New Levels". I know this is because there are levels in my testing data not seen in my training data.
What I want to do is just eliminate or ignore the levels that aren't in both data sets. I've tried to do this, but it isn't setting any levels to NA
, and the id
object says "integer (empty)":
id <- which(!(test$x %in% levels (train$x))
train$x[id] <- NA
fit <- lm(y ~ x, data=train)
P <- predict(fit,test)