0

I have looked at Prediction with lme4 on new levels where the R documentation is quoted for allow.new.levels=TRUE.

I dont understand what "if allow.new.level=TRUE, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs)" actually means.

In my case I have a data.frame called exdata

 subjects y        x1         x2
   1 0 1.6179339  0.9517194
   1 0 1.4128789  1.0248514
   1 0 0.9127448  1.8073684
   1 0 1.5729219  2.1003925
   1 0 1.6254359  1.2471660
   2 0 1.6626074  8.5102559
   2 0 1.3903638  6.0425018
   2 0 1.1438239  2.5654422
   2 0 1.1393088  2.9982242
   2 0 1.1564141  2.8395960
   3 0 1.1688192 13.9791461
   3 0 0.9255715 18.8544778
   3 0 1.2369097  4.2376671
   3 0 1.3021943  9.6894289
   3 0 1.2296961 12.4789910
   4 0 1.0978131  2.0577688
   4 0 1.1405409  1.4339044
   4 0 1.0355546  1.9496732
   4 0 1.1370849  1.7402332
   4 0 1.1942591  1.3509880
   5 0 0.4141535  2.1723957
   5 0 0.9129311  0.8274350
   5 0 0.9658796  1.2754419
   5 0 0.8370701  2.1998756
   5 0 0.5509546  2.3590774
   6 0 1.2827411  1.5474088
   6 0 1.1636606  0.7746669
   6 0 1.1782936  1.1566909
   6 0 1.1630238  1.7486415
   6 0 1.1565711  0.6984409
   7 0 0.8600331  0.1382253
   7 0 0.8303510  0.1927431
   7 0 0.8087967  0.6065926
   7 0 0.7815187  0.9464185
   7 0 0.7532042  0.9771646
   8 0 1.1638190  1.3456340
   8 0 0.5867126  1.4862727
   8 0 0.6523964  0.5138441
   8 0 0.9513971  2.3932337
   8 0 0.9278743  2.3273670
   9 1 1.0978606  1.2585635
   9 1 1.0414897  1.2946008
   9 0 0.6215353  0.2907148
   9 0 1.0267046  1.0173432
   9 0 1.1470992  0.7014759
  10 0 0.9505266  0.4247866
  10 0 0.8624758  0.2276577
  10 0 0.8279061  0.2314898
  10 0 0.7856832  0.3143003
  10 0 0.7569739  0.7880622

with 10 subjects, y as response (0-1) and two explanatory variables. I want to model it as a logit glm random effect model, so I use

model <- glmer(y~x1+x2+ (1|subjects), family=binomial(link="logit"), data=exdata[exdata$subjects!=5,], nAGQ=1)

I only use the first 4 subjects as I want to use the last one for prediction. The result of model is a variance of the random effect on 82.55 and estimates

(Intercept) -14.8377
x1            4.0366
x2            0.1056

The prediction for the new subject is

prediction <- predict(model, newdata=exdata[exdata$subjects==5,], type="response", allow.new.levels=TRUE)

giving

2.408299e-06 1.564704e-05 2.031392e-05 1.331586e-05 4.266690e-06

How are these values predicted? My hope is that ranef() for subject=5 is calculated, and the coefficients from model is used to calculate the proability. Can I in any way print the ranef() of subject=5?. As I dont understand the R documentation for allow.new.levels I cant understand how the prediction is made.

Community
  • 1
  • 1
Camilla
  • 113
  • 2
  • 12

2 Answers2

3

In this specific example only the fixed part of the model is used (since that is the population-level model for a model with a single random effect):

y1 <- as.matrix(cbind(1, exdata[exdata$subjects==5, c("x1", "x2")])) %*% fixef(model) 
c(exp(y1)/(1+exp(y1)))
#[1] 2.408298e-06 1.564704e-05 2.031392e-05 1.331586e-05 4.266689e-06
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Right, which means that if she wants to generate predictions for subject 5 that incorporate random effects for that subject, she needs to re-estimate the original model using the now-expanded sample that includes observations for subject 5, then run `predict()` using that model. – ulfelder Jul 17 '15 at 14:24
  • Another possibility (considerably harder to implement!) would be making predictions for subject 5 that are conditional on *all* of (1) the fixed-effect parameters; (2) the estimated among-subjects variance from the other 9 subjects; (3) the observed data for subject 5 -- that is, shrinkage estimates/predictions. I'm not sure it actually makes sense, but you could conceivably do it. The difference from @ulfelder's suggestion is that you wouldn't be re-estimating the among-subjects variance when including subject 5 ... – Ben Bolker Jul 17 '15 at 20:49
0

So this means that lme4 (or another R package) does not have a function that makes predictions for new levels still estimating the random effects for that new level..? The suggestion from @ulfelder is the easiest, but it changes my estimates every time a new subject is added, and the prediction is of subject 5 is based on a model, that uses data from subject 5. I would like to avoid that, because the model is then designed to predict its own values, so to speak. This is avoided by @BenBolker, but I dont know how to implement that.

What do you think about combining the two methods.

  1. run a new model (model1) using data that includes observations for subject 5.
  2. find ranef() for subject 5.
  3. calculate the predictions as @Roland did (still with the estimates etc. from the initial model)
  4. Add the random effect.

that is

model1 <- glmer(y~x1+x2+ (1|subjects), family=binomial(link="logit"), data=exdata, nAGQ=1)
y1 <- as.matrix(cbind(1, exdata[exdata$subjects==5, c("x1", "x2")])) %*% fixef(model) 
y2 <- y1 + unlist(ranef(model1))[5] 
c(exp(y2)/(1+exp(y2)))

2.398097e-06 1.558076e-05 2.022787e-05 1.325946e-05 4.248617e-06

Unfortunately the estimates (and also random effects) changes in model1 compared to model, but I an trying to avoid using the changes when predicting for subject 5.

Camilla
  • 113
  • 2
  • 12