I am trying to use the tidymodels package to build a linear mixed model. It looks like I'm specifying the formula in the correct way, as I can run a fit() on the workflow. However, when I try to run it on resamples of my data, using the function fit_resamples(), I get an error relative to missing factor levels.
It is not clear to me if I'm doing something wrong, or if the packages "multilevelmod" and "tune" are not compatible in this way, and so I would greatly appreciate any advice.
I have included a reprex using the "mpg" dataset.
EDIT: After looking more into the problem, and especially at this question: Prediction with lme4 on new levels I found out how to make lmer models predict on new value combinations, by using the allow.new.levels = TRUE argument in the predict() function. My question is then, how can I specify this inside workflow() or fit_resamples()?
I guessed that adding step_novel() in the recipe and default_recipe_blueprint(allow_novel_levels = TRUE) in the add_recipe() call would be sufficient (as in tidymodels Novel levels found in column), but it still doesn't seem to work.
library(tidyverse)
library(tidymodels)
library(multilevelmod)
set.seed(1243)
data(mpg, package = "ggplot2")
training = mpg %>%
initial_split() %>%
training() %>%
mutate(manufacturer = manufacturer %>% as_factor(),
model = model %>% as_factor())
training_folds = training %>%
validation_split()
lmm_model = linear_reg() %>%
set_engine("lmer")
lmm_recipe = recipe(cty ~ year + manufacturer + model, data = training) %>%
step_novel(manufacturer, model)
lmm_formula = cty ~ year + (1|manufacturer/model)
lmm_workflow = workflow() %>%
# see: https://stackoverflow.com/questions/68183077/tidymodels-novel-levels-found-in-column
add_recipe(lmm_recipe, blueprint = hardhat::default_recipe_blueprint(allow_novel_levels = TRUE)) %>%
add_model(lmm_model, formula = lmm_formula)
# A simple fit works
fit(lmm_workflow, training)
#> == Workflow [trained] ==========================================================
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> -- Preprocessor ----------------------------------------------------------------
#> 1 Recipe Step
#>
#> * step_novel()
#>
#> -- Model -----------------------------------------------------------------------
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: cty ~ year + (1 | manufacturer/model)
#> Data: data
#> REML criterion at convergence: 864.9986
#> Random effects:
#> Groups Name Std.Dev.
#> model:manufacturer (Intercept) 2.881
#> manufacturer (Intercept) 2.675
#> Residual 2.181
#> Number of obs: 175, groups: model:manufacturer, 38; manufacturer, 15
#> Fixed Effects:
#> (Intercept) year
#> -8.06202 0.01228
# A fit with resamplings doesn't work:
fit_resamples(lmm_workflow, resamples = training_folds)
#> Warning: package 'lme4' was built under R version 4.1.1
#> x validation: preprocessor 1/1, model 1/1 (predictions): Error:
#> ! Assigned data `facto...
#> Warning: All models failed. See the `.notes` column.
#> # Resampling results
#> # Validation Set Split (0.75/0.25)
#> # A tibble: 1 x 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [131/44]> validation <NULL> <tibble [1 x 3]>
#>
#> There were issues with some computations:
#>
#> - Error(s) x1: ! Assigned data `factor(lvl[1], levels = lvl)` must be compatibl...
#>
#> Use `collect_notes(object)` for more information.
# It seems that the problem is that the combinations of factor levels differ
# between analysis and assessment set
analysis_set = analysis(training_folds$splits[[1]])
assessment_set = assessment(training_folds$splits[[1]])
identical(
analysis_set %>% distinct(manufacturer, model),
assessment_set %>% distinct(manufacturer, model)
)
#> [1] FALSE
# directly fitting the model on the analysis set
analysis_fit = lmer(formula = lmm_formula, data = analysis_set)
# predicting the values for the missing combinations of levels is actually possible
# see: https://stackoverflow.com/questions/29259750/prediction-with-lme4-on-new-levels
assessment_predict = analysis_fit %>%
predict(training_folds$splits[[1]] %>% assessment(),
allow.new.levels = TRUE)
Created on 2022-05-24 by the reprex package (v2.0.1)