I used function ggpredict()
from package ggeffects
1.3.0 and function emmeans()
from package emmeans
to calculate estimates and confidence intervals (hereafter: CI) for a mixed-effect model. The CI estimated by the two functions differ. Why?
The following reproducible example is based on dataset RIKZ (Janssen e Mulder 2005; Zuur et al. 2007), which looks at how species richness (number of species) varies with the height of sampling stations compared to mean tidal level (NAP, in meters) and the level of exposure (factor with three levels):
rm(list=ls())
if (!require(pacman)) install.packages('pacman'); library(pacman)
p_load(emmeans)
p_load(ggplot2)
p_load(ggeffects)
p_load(lme4, lmerTest, glmmTMB)
# get data:
RIKZ <- read.csv(text = RCurl::getURL(
"https://raw.githubusercontent.com/marcoplebani85/datasets/master/RIKZ.csv"))
str(RIKZ)
# "Exposure" is a factor:
RIKZ$Exposure <- as.factor(RIKZ$Exposure)
Here I fit a generalised mixed-effect model with Poisson-distributed residuals to the data:
mem1 <- glmmTMB(Richness ~ NAP+Exposure + (1 | Beach),
family="poisson",
data = RIKZ, REML=T)
The following is a graph with data and predictions according to ggeffects::ggpredict()
:
predicted.richness <- ggpredict(mem1, terms=c("NAP", "Exposure"))
pr <- as.data.frame(predicted.richness)
pr$Exposure <- as.factor(pr$group)
names(pr) <- c("NAP", "predicted", "std.error", "conf.low", "conf.high", "group", "Exposure")
ggplot() +
geom_ribbon(data=pr,
aes(x=NAP, ymin = conf.low, ymax = conf.high, group=Exposure),
fill="grey", alpha = 0.5) +
geom_line(data=pr,
aes(x=NAP, y=predicted, linetype=Exposure)) +
geom_point(data = RIKZ,
aes(y = Richness, x = NAP, shape = Exposure)) +
scale_shape_manual(values = c(1,4,2)) +
theme_classic() +
theme(text = element_text(family="Times", size=15)) +
labs(x = "Height of sampling station\n
compared to mean tidal level (NAP) [m]",
y = "Number of species",
title="Predictions by ggpredict()")
The following is a graph with data and predictions according to emmeans::emmeans()
:
predictions <- as.data.frame(emmeans(mem1,
specs = ~ NAP + Exposure,
at = list(NAP = seq(min(RIKZ$NAP), max(RIKZ$NAP), by=0.1)),
type="response"
))
names(predictions) <- c("NAP", "Exposure", "predicted", "SE", "df", "CI.low", "CI.high")
ggplot() +
geom_ribbon(data=predictions,
aes(x=NAP,ymin = CI.low, ymax = CI.high, group=Exposure),
fill="grey", alpha = 0.5) +
geom_point(data = RIKZ,
aes(y = Richness, x = NAP, shape = Exposure)) +
geom_line(data=predictions,
aes(x=NAP,y=predicted, linetype=Exposure)) +
scale_shape_manual(values = c(1,4,2)) +
theme_classic() +
theme(text = element_text(family="Times", size=15)) +
labs(x = "Height of sampling station\n
compared to mean tidal level (NAP) [m]",
y = "Number of species",
title="Predictions by emmeans()")
Why do the predicted CI estimates differ?
References:
- Janssen, G. and Mulder, S. (2005), Zonation of macrofauna across sandy beaches and surf zones along the Dutch coast. Oceanologia 47(2): 265-282.
- Zuur, A.F., Ieno, E.N., and Smith, G.M. (2007), Analysing ecological data. Springer Science & Business Media