2

I used function ggpredict() from package ggeffects 1.3.0 and function emmeans() from package emmeans to calculate estimates and confidence intervals (hereafter: CI) for a mixed-effect model. The CI estimated by the two functions differ. Why?

The following reproducible example is based on dataset RIKZ (Janssen e Mulder 2005; Zuur et al. 2007), which looks at how species richness (number of species) varies with the height of sampling stations compared to mean tidal level (NAP, in meters) and the level of exposure (factor with three levels):

rm(list=ls())
if (!require(pacman)) install.packages('pacman'); library(pacman)
p_load(emmeans)
p_load(ggplot2)
p_load(ggeffects)
p_load(lme4, lmerTest, glmmTMB)
# get data:
RIKZ <- read.csv(text = RCurl::getURL(
"https://raw.githubusercontent.com/marcoplebani85/datasets/master/RIKZ.csv"))
str(RIKZ)
# "Exposure" is a factor:
RIKZ$Exposure <- as.factor(RIKZ$Exposure)

Here I fit a generalised mixed-effect model with Poisson-distributed residuals to the data:

mem1 <- glmmTMB(Richness ~ NAP+Exposure + (1 | Beach),
                family="poisson",
                data = RIKZ, REML=T)

The following is a graph with data and predictions according to ggeffects::ggpredict():

predicted.richness <- ggpredict(mem1, terms=c("NAP", "Exposure"))
pr <- as.data.frame(predicted.richness)
pr$Exposure <- as.factor(pr$group)
names(pr) <- c("NAP", "predicted", "std.error", "conf.low", "conf.high", "group", "Exposure")
ggplot() +
  geom_ribbon(data=pr, 
              aes(x=NAP, ymin = conf.low, ymax = conf.high, group=Exposure), 
              fill="grey", alpha = 0.5) +
  geom_line(data=pr, 
            aes(x=NAP, y=predicted, linetype=Exposure)) +
  geom_point(data = RIKZ, 
             aes(y = Richness, x = NAP, shape = Exposure)) +
  scale_shape_manual(values = c(1,4,2)) +
  theme_classic() +
  theme(text = element_text(family="Times", size=15)) +
  labs(x = "Height of sampling station\n
       compared to mean tidal level (NAP) [m]",
       y = "Number of species",
       title="Predictions by ggpredict()")

enter image description here

The following is a graph with data and predictions according to emmeans::emmeans():

predictions <- as.data.frame(emmeans(mem1, 
            specs = ~ NAP + Exposure,
            at = list(NAP = seq(min(RIKZ$NAP), max(RIKZ$NAP), by=0.1)),
            type="response"
            ))
names(predictions) <- c("NAP", "Exposure", "predicted", "SE", "df", "CI.low", "CI.high")

ggplot() +
  geom_ribbon(data=predictions, 
              aes(x=NAP,ymin = CI.low, ymax = CI.high, group=Exposure), 
              fill="grey", alpha = 0.5) +
  geom_point(data = RIKZ, 
             aes(y = Richness, x = NAP, shape = Exposure)) +
  geom_line(data=predictions, 
            aes(x=NAP,y=predicted, linetype=Exposure)) +
  scale_shape_manual(values = c(1,4,2)) +
  theme_classic() +
  theme(text = element_text(family="Times", size=15)) +
  labs(x = "Height of sampling station\n
       compared to mean tidal level (NAP) [m]",
       y = "Number of species",
       title="Predictions by emmeans()")

enter image description here

Why do the predicted CI estimates differ?

References:

  • Janssen, G. and Mulder, S. (2005), Zonation of macrofauna across sandy beaches and surf zones along the Dutch coast. Oceanologia 47(2): 265-282.
  • Zuur, A.F., Ieno, E.N., and Smith, G.M. (2007), Analysing ecological data. Springer Science & Business Media
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Marco Plebani
  • 436
  • 2
  • 14
  • Why aren’t you using ggemmeans instead of ggpredict. – IRTFM Sep 02 '23 at 20:01
  • @IRTFM the difference in prediction confidence of ggpredict() and emmeans is what I am curious about. Function ggemmeans() is simply a ggplot-friendly version of emmeans(), which is what I used here. – Marco Plebani Sep 02 '23 at 21:52
  • You’re asking for prediction on a mixed model first. Are you askin for a mixed model in the second run? – IRTFM Sep 02 '23 at 23:43
  • I think that `ggpredict` is probably using `se.fit=TRUE` from `glmmTMB:::predict.glmmTMB`, which incorporates the joint uncertainty of all model parameters (fixed-effect `beta` and covariance `theta`, as well as random parameters `b`), while `emmeans` might be using only the uncertainty in the fixed effect parameters. This is a good question, I will take another look when I get a chance. – Ben Bolker Sep 03 '23 at 00:16

0 Answers0