3

When we use a traditional logistic regression and make a prediction in R, for example:

library(dplyr)
n = 300
xx<-c("r1","r2","r3","r4","r5")
xxx<-c("e1","e2","e3")
p=0.3
df1 <- data_frame(
  xx1 = runif(n, min = 0, max = 10),
  xx2 = runif(n, min = 0, max = 10),
  xx3 = runif(n, min = 0, max = 10),
 School = factor(sample(xxx, n,re=TRUE)),
 Rank = factor(sample(xx, n,re=TRUE)),
 yx = as.factor(rbinom(n, size = 1, prob = p))
)
df1
mm<-glm(yx ~ xx1 + xx2 + xx3 + School + Rank,binomial,df1)
n11 = data.frame(School="e3",Rank="r2",xx1=8.58,xx2=8.75,xx3=7.92)

We use:

predict(mm, n11, type="response") #No meu caso especifico

ou predict(mm, n11)

depending on what interests us, no Problem.

But when we work with GLMM

library(lme4)
mm2 <- glmer(yx ~ xx1 + xx2 + xx3 + Rank +  (Rank | School), data = df1, 
family = "binomial",control = glmerControl(calc.derivs = FALSE))
predict(mm2, n11, type="response") #No meu caso especifico

Displays the error

 Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
contrasts can be applied only to factors with 2 or more levels

I tried to do so

 predict(m2,n11, re.form=(~Rank|School))

This displays the error

Error in UseMethod("predict") : 
no applicable method for 'predict' applied to an object of class "glmmadmb"

What would be the correct form of the prediction in R - GLMM?

Cleber Iack
  • 63
  • 1
  • 7
  • 3
    `n11 = data.frame(School=factor("e3", levels = levels(df1$School)), Rank=factor("r2", levels = levels(df1$Rank)),xx1=8.58,xx2=8.75,xx3=7.92)` – Roland Jan 26 '18 at 13:16
  • @Roland, please post as answer? I've started an issue [here](https://github.com/lme4/lme4/issues/452) (We try to cover these cases, but I clearly haven't got all edge [?] cases covered ...) – Ben Bolker Jan 30 '18 at 12:33
  • BTW in the second example you're clearly trying to predict with a `glmmadmb` object rather than a `merMod` (lme4) object ... – Ben Bolker Mar 09 '18 at 19:10

1 Answers1

6

The problem is that your model specification doesn't match the structure of the new data you provide. More specifically, the levels of the (automatically converted to factor) variables School and Rank only have a single level, whereas the model expects three levels. It has parameters for three levels, so if those three levels can't be found, you can't use the proper design matrix to calculate the new predictions.

That's the underlying reason as to why @Roland is right in the comments and that you have to specifically create a variable with the same levels as used in the data that trained the model.

n11 <- data.frame(School=factor("e3", levels = levels(df1$School)), 
                  Rank=factor("r2", levels =levels(df1$Rank)),
                  xx1=8.58,xx2=8.75,xx3=7.92)
Joris Meys
  • 106,551
  • 31
  • 221
  • 263