5

This may be more of a bug report than a question, but: why does explicitly using the newdata argument to predict using the same dataset as the training data sometimes produce different predictions than omitting the newdata argument and using the training dataset explicitly?

library(lme4)
packageVersion("lme4") # 1.1.8
m1 <- glmer(myformula, data=X, family="binomial")
p1 <- predict(m1, type="response")
p2 <- predict(m1, type="response", newdata=X)
all(p1==p2) # FALSE

This isn't just a rounding error. I'm seeing cor(p1,p2) return 0.8.

This seems to be isolated to models with slopes. In the following plot, implicit means predict(..., type="response") without newdata, and explicit means predict(..., type="response", newdata=X), where X is the same as training. The only difference between model 1 and the other models is that model 1 contains only (random) intercepts, and the other models have random intercepts and random slopes.

enter image description here

Jack Tanner
  • 934
  • 1
  • 8
  • 24
  • It would be helpful to provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data such that we can also run the code to verify. Be sure to include how you specified the formula. – MrFlick Feb 05 '15 at 23:59
  • 1
    @MrFlick; try with `m1 <- lmer(Reaction ~ Days + (Days||Subject), sleepstudy)` – user20650 Feb 06 '15 at 00:10
  • 1
    Hmm, predictions are equal using `newX` rather than `newdata` – user20650 Feb 06 '15 at 00:16
  • @user20650 you're right! newX does the trick. For those playing at home, try `?lme4::predict.merMod`. But why does `lme4::predict.merMod` treat `newdata` and `newX` differently? – Jack Tanner Feb 06 '15 at 03:17
  • Oh no. `lme4::predict.merMod` declares `newX` as a parameter... and NEVER uses it. So `newX` is the same as omitting `newdata`, but not because it uses `newX differently! If you specify `newX` and not `newdata`, then that omits `newdata`, and it's the same as my "implicit" case. @BenBolker, could you help? – Jack Tanner Feb 06 '15 at 03:42
  • FWIW, with lme4 1.1-7 (on linux, R 3.1.1 yep, a little out of date now) the **sleepstudy** example does return `TRUE` for `all(p1 == p2)`. – Gavin Simpson Feb 06 '15 at 21:51
  • OK, the `newX` thing is certainly a mistake (probably something I was starting to implement that escaped). But so far I don't have a reproducible example (because the `sleepstudy` example does work -- on 1.1-8 as well as 1.1-7). I don't doubt that there's a bug in predict.merMod somewhere, but I will really need an example to use for debugging. – Ben Bolker Feb 06 '15 at 21:59
  • PS haven't checked a `glmer` example because I don't have a random-slopes GLMM model handy -- I doubt that `glmer`/`lmer` is the difference though. – Ben Bolker Feb 06 '15 at 22:04
  • @BenBolker and Gavin, thanks for the answer; but a little strange - as with the OP, the predictions for the `sleepstudy` model aren't equal for me on ubuntu 12.04, R3.1.2, lme4 1.1.7. Just to check, is this the right commands `m1 <- lmer(Reaction ~ Days + (Days||Subject), sleepstudy); p1 <- predict(m1, type="response"); p2 <- predict(m1, type="response", newdata=sleepstudy)` – user20650 Feb 07 '15 at 01:58
  • @BenBolker I confirm that the `sleepstudy` example fails for me, exactly as for @user20650 ^^. I'm on Windows 7 x64, R 3.1.2, lme4 1.1.8. – Jack Tanner Feb 07 '15 at 03:42
  • 1
    conversation continued at https://github.com/lme4/lme4/issues/279 . I'm having a hard time thinking of what could differ among these platforms (FWIW, I'm not aware offhand of any changes in the predict method between 1.1-7 and 1.1-8 ...) – Ben Bolker Feb 07 '15 at 16:48

1 Answers1

4

It turns out that this is a bug in predict.merMod that has been fixed in the development version (in November 2014, as this Github issue). If you have compilation tools installed you can install the development version directly from Github via

devtools::install_github("lme4/lme4")
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453