1

I am trying to predict new data that may, for some cases, have new factor levels than the data used to fit the model. As such, I want to change the factor levels in the new data to match those of the old data. I would change those instances where the data doesn't match to NAs as described here. I can do it manually column-by-column but I want to generalize this replacement to all the columns in my data frame. Could someone please give some insight into how to do this, presumably with apply?

I've tried using the function below

 lapply(newDta, function(x) {
    newFactorVector <- which(!(newDta[, x] %in% levels(oldDta[, x])))
    newDta[newFactorVector, x] <- NA
    levels(newDta[, x]) <- levels(oldDta[, x])
})

but it throws the following error:

Error in Summary.factor(c(2L, 1L, 7L, 1L, 7L, 2L, 2L, 2L, 2L, 7L, 1L,  :
min not meaningful for factors 

Thanks.

Community
  • 1
  • 1
TSW
  • 661
  • 2
  • 7
  • 11

0 Answers0