I constructed a linear regression model from some training data and then tried to use the predict
function to predict values for a test dataset, but when I do I get the following error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor Overall.Qual has new levels 1, 2, 3
I've read in other threads that this occurs because the variable in the test dataset has a level not included in the model, but when I checked this using the 'summary' function, the variable has the same number of levels in both the training set and the test set.
This is the summary output for the training dataset:
1 2 3 4 5 6 7 8 9 10
1 5 9 58 268 209 168 88 24 4
And this is the output for the test dataset:
1 2 3 4 5 6 7 8 9 10
1 2 12 63 214 234 171 96 19 5
However, if I check what levels are actually included in the model using full_model$xlevels
, it seems that it is dropping levels 1,2 and 3 in the model. I understand it dropping level 1 since it only has one data point, but I'm really confused as to why it's dropping the other two levels. Can anyone explain this? And is there a good way to fix it rather than just removing those levels from the test dataset?