13

I'm trying to fit a mixed effects model and then use that model to generate estimates on a new dataset that may have different levels. I expected that the estimates on a new dataset would use the mean value of the estimated parameters, but that doesn't seem to be the case. Here's a minimum working example:

library(lme4)
d = data.frame(x = rep(1:10, times = 3),
               y = NA,
               grp = rep(1:3, each = 10))
d$y[d$grp == 1] = 1:10 + rnorm(10)
d$y[d$grp == 2] = 1:10 * 1.5 + rnorm(10)
d$y[d$grp == 3] = 1:10 * 0.5 + rnorm(10)
fit = lmer(y ~ (1+x)|grp, data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)

In this example, I'm essentially defining three groups with different regression equations (slopes of 1, 1.5 and 0.5). However, when I try to predict on a new dataset with an unseen level, I get a constant estimate. I would have expected the expected value of the slope and intercept to be used to generate predictions for this new data. Am I expecting the wrong thing? Or, what am I doing wrong with my code?

random_forest_fanatic
  • 1,232
  • 1
  • 12
  • 30

2 Answers2

18

I generally wouldn't include a random slope without including a fixed slope. It seems like predict.merMod agrees with me, because it seems to simply use only the fixed effects to predict for new levels. The documentation says "the prediction will use the unconditional (population-level) values for data with previously unobserved levels", but these values don't seem to be estimated with your model specification.

Thus, I suggest this model:

fit = lmer(y ~ x + (x|grp), data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)
#       1         2         3         4         5         6         7         8         9        10 
#1.210219  2.200685  3.191150  4.181616  5.172082  6.162547  7.153013  8.143479  9.133945 10.124410

This is the same as only using the fixed effects part of the model:

t(cbind(1, newdata$x) %*% fixef(fit))
#         [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
#[1,] 1.210219 2.200685 3.19115 4.181616 5.172082 6.162547 7.153013 8.143479 9.133945 10.12441
Roland
  • 127,288
  • 10
  • 191
  • 288
  • I understand that this will still use the fixed effects only in the new prediction. But, how do you add the random effects? – Sapiens Sep 10 '20 at 13:41
  • I don't understand your question. – Roland Sep 10 '20 at 13:43
  • As you said before: 'predict.merMod just uses the coefficients from the fixed effects parts of the model for new levels'. Is there a way to also include the random effects (x|grp)? – Sapiens Sep 10 '20 at 13:56
  • Sure, that's the default. You just need to use the default `allow.new.levels = FALSE`. Of course, you can't predict random effects for new levels (which have not been part of the training data). That's conceptually not possible. – Roland Sep 10 '20 at 14:18
  • 1
    I see. I am interested in finding an estimation of the random effects for a completely new subject, I guess I am using the wrong approach. Thank you anyways. – Sapiens Sep 12 '20 at 14:24
7

Maybe it's not clear enough, but I think the documentation for ?predict.merMod states (reasonably) clearly what happens when allow.new.levels=TRUE. I guess the ambiguity might be in what "unconditional (population-level) values" means ...

allow.new.levels: logical if new levels (or NA values) in ‘newdata’ are allowed. If FALSE (default), such new values in ‘newdata’ will trigger an error; if TRUE, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs).

"Unconditional (population-level)" means that the corresponding random effect components are set to zero — which is what we do if we cannot condition on the observed data for a particular group, since we don't want to specify that the prediction is for a particular group

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I'm experiencing a similar confusion as the original poster. Can you please elaborate on how 'allow.new.levels' works? What is an unconditional (population-level) value? – Arthur Sep 29 '22 at 13:53
  • 1
    unconditional/population-level means that the corresponding random effects are set to zero (which is what we would do if we cannot **condition on** the fact that an observation comes from a particular group) – Ben Bolker Sep 29 '22 at 13:56
  • Appreciate the answer. Is `predict.merMod` robust for complex cases, where for example, a hierarchical term `individual\family` is being predicted for a new `individual` but a known `family`? I assume what makes sense here is to set the coefficient for the new `individual` to zero but use the fit coefficient for the appropriate `family`. – Arthur Sep 29 '22 at 14:29
  • 1
    Yes. You can either use `re.form` to include/exclude relevant terms, or make the specified family a new (previously unobserved) level and use `allow.new.levels = TRUE` – Ben Bolker Sep 29 '22 at 14:30