I am running an analysis in r on the effect of canopy cover (OverheadCover
, proportion bounded by 0,1) and the number of carcasses placed on the same location (CarcassNumber
, factor with 2 levels) on the proportion of carrion eaten by birds (ProportionBirdsScavenging
, proportion bounded by 0,1). I plotted this interaction by modelling the effect of OverheadCover
on ProportionBirdsScavenging
for the separate values of CarcassNumber
, then plotting it in the same graph. Having done this, I saw discrepancies in the SE's calculated by plot_model(glmm_interaction, type = "int")
and the SE's I calculated. Here started an investigation. This journey led me through the source codes of plot_model
, plot_type_int
, ggpredict
and stopped with ggpredict_helper
. I did unfortunately not find the calculation of SE's, but I found proof of the difference in SE's. I always was pretty confident that I calculated my SE's correctly, but now I'm not so sure anymore. Please see my code below.
#rm(list = ls())
library(glmmTMB)
library(dplyr)
data_both <- data.frame(ProportionBirdsScavenging = c(0.406192519926425, 0.871428571428571, 0.452995391705069, 0.484821428571429, 0.795866569978245, 0.985714285714286, 0.208571428571429, 0.573982970671712, 0.694285714285714, 0.930204081632653, 0.0483709273182957, 0.0142857142857143, 0.661904761904762, 0.985714285714286, 0.0142857142857143, 0.0142857142857143),
pointWeight = c(233, 17, 341, 128, 394, 46, 5, 302, 10, 35, 57, 39, 12, 229, 28, 116),
OverheadCover = c(0.671, 0.04, 0.46, 0.65, 0.02, 0, 0.8975, 0.585, 0.6795, 0.0418, 0.5995, 0.6545, 0.02, 0, 0.92, 0.585),
CarcassNumber = as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)),
Area = c("Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst", "Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst"))
data_both$pointWeight_scaled <- scales::rescale(data_both$pointWeight, to = c(0.0001,1)) # rescale weights
glmm_interaction <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + (1|Area), data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled)
ggeffects::ggpredict(glmm_interaction, terms = c("OverheadCover", "CarcassNumber [1:2]")) # SE's calculated by r
# Calculate the SE's for CarcassNumber 1
df_first_carcasses <- filter(data_both, CarcassNumber == 1) # create df with only first carcasses
myglmm <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover, data = df_first_carcasses, beta_family(link = "logit"), weights = pointWeight_scaled)
new.xglmm <- expand.grid(OverheadCover = seq(min(data_both$OverheadCover), max(data_both$OverheadCover), length.out = 1000)) %>%
mutate(Area = "Hamert", pointWeight_scaled = 1) # pad new.xglmm with an arbitrary value for Area and pointWeight_scaled, then exclude them in predict, otherwise error -> https://stackoverflow.com/questions/54411851/mgcv-how-to-use-exclude-argument-in-predict-gam
new.yglmm <- data.frame(predict(myglmm, new.xglmm, type = "link", exclude = c("Area","pointWeight_scaled"), se.fit = TRUE)) %>% # exclude Area and pointWeight_scaled from the prediction
mutate(ProportionBirdsScavenging = plogis(fit)) %>% # calculate the ProportionBirdsScavenging on response scale
rename(SE.untransformed = se.fit, untransformed.predictions = fit)
addTheseglmm1 <- mutate(data.frame(new.xglmm, new.yglmm), # calculate the lwr and upr bounds using the untransformed predictions and SE, then transformed to probability
lwr = plogis(untransformed.predictions - SE.untransformed),
upr = plogis(untransformed.predictions + SE.untransformed))
addTheseglmm1[c(1,45,501,653,707,1000),c(1,6,5)] # compare my SE's to the ggpredict SE's
# calculated by ggpredict > manually calculated
# x | Predicted | SE | > SE.untransformed
# ------------------------- > -------------------
# 0.00 | 0.82 | 0.44 | > 0.43
# 0.04 | 0.80 | 0.41 | > 0.40
# 0.46 | 0.56 | 0.22 | > 0.21
# 0.60 | 0.47 | 0.26 | > 0.25
# 0.65 | 0.44 | 0.29 | > 0.28
# 0.92 | 0.28 | 0.47 | > 0.46
# ( CarcassNumber = 1 )
# Same for CarcassNumber 2
df_second_carcasses <- filter(data_both, CarcassNumber == 2) # create df with only second carcasses
myglmm <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover, data = df_second_carcasses, beta_family(link = "logit"), weights = pointWeight_scaled)
new.xglmm <- expand.grid(OverheadCover = seq(min(data_both$OverheadCover), max(data_both$OverheadCover), length.out = 1000)) %>%
mutate(Area = "Hamert", pointWeight_scaled = 1)
new.yglmm <- data.frame(predict(myglmm, new.xglmm, type = "link", exclude = c("Area","pointWeight_scaled"), se.fit = TRUE)) %>%
mutate(ProportionBirdsScavenging = plogis(fit)) %>%
rename(SE.untransformed = se.fit, untransformed.predictions = fit)
addTheseglmm2 <- mutate(data.frame(new.xglmm, new.yglmm),
lwr = plogis(untransformed.predictions - SE.untransformed),
upr = plogis(untransformed.predictions + SE.untransformed))
addTheseglmm2[c(1,45,501,653,707,1000),c(1,6,5)]
# calculated by ggpredict > manually calculated
# x | Predicted | SE | > SE.untransformed
# ------------------------- > -------------------
# 0.00 | 0.96 | 1.01 | > 1.21
# 0.04 | 0.95 | 0.94 | > 1.10
# 0.46 | 0.16 | 0.86 | > 0.96
# 0.60 | 0.04 | 1.11 | > 1.32
# 0.65 | 0.02 | 1.22 | > 1.46
# 0.92 | 0.00 | 1.83 | > 2.31
# ( CarcassNumber = 2 )
The differences are most visible for CarcassNumber
2. Is my calculation incorrect? I suspect that this discrepancy might have to do with the including and excluding of pointWeight_scaled
in the predict
function. If I don't do this, it returns me an error saying Error in eval(extras, data, env) : object 'pointWeight_scaled' not found
. Is this a common problem? I read here that including and then excluding it resolves the issue mgcv: How to use 'exclude' argument in predict.gam?. Is this not the proper way?
I hope that someone can shed some light on this problem.