0
library(ggplot2)
set.seed(1)
train.ind <- sample(1:nrow(mpg), round(nrow(mpg)/2))
lm_mod <- lm(displ ~ ., data = mpg[train.ind, ])
lm_pred <- predict(lm_mod,  mpg[-train.ind, ])

Running the last line gives me the error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor model has new levels land cruiser wagon 4wd

I am confused because I don't have any factor variables in the mpg data.

> str(mpg)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   234 obs. of  11 variables:
 $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
 $ model       : chr  "a4" "a4" "a4" "a4" ...
 $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr  "f" "f" "f" "f" ...
 $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr  "p" "p" "p" "p" ...
 $ class       : chr  "compact" "compact" "compact" "compact" ...

How can I go about fixing this?

Adrian
  • 9,229
  • 24
  • 74
  • 132
  • 1
    Possible duplicate of [predict.lm() with an unknown factor level in test data](https://stackoverflow.com/questions/4285214/predict-lm-with-an-unknown-factor-level-in-test-data) – Yannis Vassiliadis Apr 22 '18 at 21:11
  • I think the characters are being converted to factors. Try `lm_mod$xlevels`. – Dan Apr 22 '18 at 21:34

0 Answers0