5

I’m having a bit of a struggle trying to use the lme4 predict function on my mixed models. When making predications I want to be able to set some of my explanatory variables to a specified level but average across others.

Here’s some made up data that is a simplified, nonsense version of my original dataset:

a <-  data.frame(
    TLR4=factor(rep(1:3, each=4, times=4)), 
    repro.state=factor(rep(c("a","j"),each=6,times=8)), 
    month=factor(rep(1:2,each=8,times=6)), 
    sex=factor(rep(1:2, each=4, times=12)), 
    year=factor(rep(1:3, each =32)), 
    mwalkeri=(sample(0:15, 96, replace=TRUE)), 
    AvM=(seq(1:96))
)

The AvM number is the water vole identification number. The response variable (mwalkeri) is a count of the number of fleas on each vole. The main explanatory variable I am interested in is Tlr4 which is a gene with 3 different genotypes (coded 1, 2 and 3). The other explanatory variables included are reproductive state (adult or juvenile), month (1 or 2), sex (1 or 2) and year (1, 2 or 3). My model looks like this (of course this model is now inappropriate for the made up data but that shouldn't matter):

install.packages("lme4")
library(lme4)
mm <- glmer(mwalkeri~TLR4+repro.state+month+sex+year+(1|AvM), data=a, 
    family=poisson,control=glmerControl(optimizer="bobyqa"))`
summary(mm)

I want to make predictions about parasite burden for each different Tlr4 genotype while accounting for all the other covariates. To do this I created a new dataset to specify the level I wanted to set each of the explanatory variables to and used the predict function:

b <-  data.frame(
    TLR4=factor(1:3), 
    repro.state=factor(c("a","a","a")),
    month=factor(rep(1, times=3)), 
    sex=factor(rep(1, times=3)), 
    year=factor(rep(1, times=3))
)
predict(mm, newdata=b, re.form=NA, type="response")

This did work but I would really prefer to average across years instead of setting year to one particular level. However, whenever I attempt to average year I get this error message:

Error in model.frame.default(delete.response(Terms), newdata, na.action = na.action, : factor year has new level

Is it possible for me to average across years instead of selecting a specified level? Also, I've not worked out how to get the standard error associated with these predictions. The only way I've been able to get standard error for predictions was using the lsmeans() function (from the lsmeans package):

c <- lsmeans(mm, "TLR4", type="response")
summary(c, type="response")

Which automatically generates the standard error. However, this is generated by averaging across all the other explanatory variables. I'm sure it’s probably possible to change that but I would rather use the predict() function if I can. My goal is to create a graph with Tlr4 genotype on the x-axis and predicted parasite burden on the y-axis to demonstrate the predicted differences in parasite burden for each genotype while all other significant covariants are accounted for.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Martha
  • 51
  • 1
  • 5
  • You haven't included a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so it's difficult to make specific coding suggestions and test whether or not they work. As long as you pass in a new data.frame with column names that exactly match your original data set, it should be pretty easy to use `predict()`. How about you include code for one of your attempts and the exact error message returned. – MrFlick Oct 22 '14 at 19:20
  • I would like to spend some time on this but haven't had a moment yet. You have to specify *some* value for every fixed effect, but any one of the following three options is reasonable: (1) set it to a mean or baseline value ("conditional"); (2) allow it to vary over a reasonable range and then average the predicted values ("marginal"); (3) allow it to vary over a range and find a way to plot all the values in your final plot (grouping lines, or within facets, or by colors, or something). – Ben Bolker Oct 23 '14 at 01:13
  • Thanks for your replies! @MrFlick I've edited my original question to include some of my code and have provided a reproducible example; hopefully this makes it a bit clearer. @Ben Bolker is it possible to set some of my explanatory variables to a specified level but look at the mean of others depending on what makes the most sense biologically? I'm okay using `predict()` when I set it to a level that’s already coded for in the dataset but I can’t make it work if I try to do anything else (e.g. take the average of year). As always, any advice would be appreciated! – Martha Oct 28 '14 at 13:33
  • Are you sure you want to treat year/month as categorical in this case? Collapsing year as a factor is like collapsing sex. There is no "average" sex. I guess you could fit with each and then average the predictions, but that's really going to mess with standard error calculations – MrFlick Oct 28 '14 at 23:24
  • Also, in the `?predict.merMod` help pages, they have a note that says "There is no option for computing standard errors of predictions because it is difficult to define an efficient method that incorporates uncertainty in the variance parameters; we recommend bootMer for this task." Thus you may want to consider bootstrapping to estimate errors (unless you had a different way to calculate them in mind). – MrFlick Oct 28 '14 at 23:28

1 Answers1

1

You might be interested in the merTools package which includes a couple of functions for creating datasets of counterfactuals and then making predictions on that new data to explore the substantive impact of variables on the outcome. A good example of this comes from the README and the package vignette:

Let's take the case where we want to explore the impact of a model with an interaction term between a category and a continuous predictor. First, we fit a model with interactions:

data(VerbAgg)
fmVA <- glmer(r2 ~ (Anger + Gender + btype + situ)^2 +
       (1|id) + (1|item), family = binomial, 
       data = VerbAgg)

Now we prep the data using the draw function in merTools. Here we draw the average observation from the model frame. We then wiggle the data by expanding the dataframe to include the same observation repeated but with different values of the variable specified by the var parameter. Here, we expand the dataset to all values of btype, situ, and Anger.

# Select the average case
newData <- draw(fmVA, type = "average")
newData <- wiggle(newData, var = "btype", values = unique(VerbAgg$btype))
newData <- wiggle(newData, var = "situ", values = unique(VerbAgg$situ))
newData <- wiggle(newData, var = "Anger", values = unique(VerbAgg$Anger))

head(newData, 10)

#>    r2 Anger Gender btype  situ id        item
#> 1   N    20      F curse other  5 S3WantCurse
#> 2   N    20      F scold other  5 S3WantCurse
#> 3   N    20      F shout other  5 S3WantCurse
#> 4   N    20      F curse  self  5 S3WantCurse
#> 5   N    20      F scold  self  5 S3WantCurse
#> 6   N    20      F shout  self  5 S3WantCurse
#> 7   N    11      F curse other  5 S3WantCurse
#> 8   N    11      F scold other  5 S3WantCurse
#> 9   N    11      F shout other  5 S3WantCurse
#> 10  N    11      F curse  self  5 S3WantCurse

Now we simply pass this new dataset to predictInterval in order to generate predictions for these counterfactuals. Then we plot the predicted values against the continuous variable, Anger, and facet and group on the two categorical variables situ and btype respectively.

plotdf <- predictInterval(fmVA, newdata = newData, type = "probability", 
        stat = "median", n.sims = 1000)
plotdf <- cbind(plotdf, newData)
ggplot(plotdf, aes(y = fit, x = Anger, color = btype, group = btype)) + 
  geom_point() + geom_smooth(aes(color = btype), method = "lm") + 
  facet_wrap(~situ) + theme_bw() +
  labs(y = "Predicted Probability")

enter image description here

jknowles
  • 479
  • 3
  • 13