Prediction with lme4 on new levels

Question

I'm trying to fit a mixed effects model and then use that model to generate estimates on a new dataset that may have different levels. I expected that the estimates on a new dataset would use the mean value of the estimated parameters, but that doesn't seem to be the case. Here's a minimum working example:

library(lme4)
d = data.frame(x = rep(1:10, times = 3),
               y = NA,
               grp = rep(1:3, each = 10))
d$y[d$grp == 1] = 1:10 + rnorm(10)
d$y[d$grp == 2] = 1:10 * 1.5 + rnorm(10)
d$y[d$grp == 3] = 1:10 * 0.5 + rnorm(10)
fit = lmer(y ~ (1+x)|grp, data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)

In this example, I'm essentially defining three groups with different regression equations (slopes of 1, 1.5 and 0.5). However, when I try to predict on a new dataset with an unseen level, I get a constant estimate. I would have expected the expected value of the slope and intercept to be used to generate predictions for this new data. Am I expecting the wrong thing? Or, what am I doing wrong with my code?

I believe `predict.merMod` just uses the coefficients from the fixed effects parts of the model for new levels. `y ~ x + (x|grp)` is a more sensible model specification. — Roland, Mar 25 '15 at 16:35
Ah, that makes sense! If you add that as an answer I'll accept it. — random_forest_fanatic, Mar 26 '15 at 06:13

score 18 · Accepted Answer · answered Mar 26 '15 at 08:11

18

I generally wouldn't include a random slope without including a fixed slope. It seems like predict.merMod agrees with me, because it seems to simply use only the fixed effects to predict for new levels. The documentation says "the prediction will use the unconditional (population-level) values for data with previously unobserved levels", but these values don't seem to be estimated with your model specification.

Thus, I suggest this model:

fit = lmer(y ~ x + (x|grp), data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)
#       1         2         3         4         5         6         7         8         9        10 
#1.210219  2.200685  3.191150  4.181616  5.172082  6.162547  7.153013  8.143479  9.133945 10.124410

This is the same as only using the fixed effects part of the model:

t(cbind(1, newdata$x) %*% fixef(fit))
#         [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
#[1,] 1.210219 2.200685 3.19115 4.181616 5.172082 6.162547 7.153013 8.143479 9.133945 10.12441

answered Mar 26 '15 at 08:11

Roland

127,288
10
191
288

I understand that this will still use the fixed effects only in the new prediction. But, how do you add the random effects? – Sapiens Sep 10 '20 at 13:41
I don't understand your question. – Roland Sep 10 '20 at 13:43
As you said before: 'predict.merMod just uses the coefficients from the fixed effects parts of the model for new levels'. Is there a way to also include the random effects (x|grp)? – Sapiens Sep 10 '20 at 13:56
Sure, that's the default. You just need to use the default `allow.new.levels = FALSE`. Of course, you can't predict random effects for new levels (which have not been part of the training data). That's conceptually not possible. – Roland Sep 10 '20 at 14:18
1

I see. I am interested in finding an estimation of the random effects for a completely new subject, I guess I am using the wrong approach. Thank you anyways. – Sapiens Sep 12 '20 at 14:24

Ben Bolker · Answer 2 · 2023-06-28T18:29:12.027

7

Maybe it's not clear enough, but I think the documentation for ?predict.merMod states (reasonably) clearly what happens when allow.new.levels=TRUE. I guess the ambiguity might be in what "unconditional (population-level) values" means ...

allow.new.levels: logical if new levels (or NA values) in ‘newdata’ are allowed. If FALSE (default), such new values in ‘newdata’ will trigger an error; if TRUE, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs).

"Unconditional (population-level)" means that the corresponding random effect components are set to zero — which is what we do if we cannot condition on the observed data for a particular group, since we don't want to specify that the prediction is for a particular group

edited Jun 28 '23 at 18:29

answered Mar 26 '15 at 12:20

Ben Bolker

211,554
25
370
453

I'm experiencing a similar confusion as the original poster. Can you please elaborate on how 'allow.new.levels' works? What is an unconditional (population-level) value? – Arthur Sep 29 '22 at 13:53
1

unconditional/population-level means that the corresponding random effects are set to zero (which is what we would do if we cannot **condition on** the fact that an observation comes from a particular group) – Ben Bolker Sep 29 '22 at 13:56
Appreciate the answer. Is `predict.merMod` robust for complex cases, where for example, a hierarchical term `individual\family` is being predicted for a new `individual` but a known `family`? I assume what makes sense here is to set the coefficient for the new `individual` to zero but use the fit coefficient for the appropriate `family`. – Arthur Sep 29 '22 at 14:29
1

Yes. You can either use `re.form` to include/exclude relevant terms, or make the specified family a new (previously unobserved) level and use `allow.new.levels = TRUE` – Ben Bolker Sep 29 '22 at 14:30

Prediction with lme4 on new levels

2 Answers2

Linked