Non-parametric bootstrapping to generate 95% Confidence Intervals for fixed effect coefficients calculated by a glmer with nested random effects

Question

I have an R coding question.

This is my first time asking a question here, so apologies if I am unclear or do something wrong. I am trying to use a Generalized Linear Mixed Model (GLMM) with Poisson error family to test for any significant effect on a count response variable by three separate dichotomous variables (AGE = ADULT or JUVENILE, SEX = MALE or FEMALE and MEDICATION = NEW or OLD) and an interaction between AGE and MEDICATION (AGE:MEDICATION). There is some dependency in my data in that the data was collected from a total of 22 different sites (coded as SITE vector with 33 distinct levels), and the data was collected over a total of 21 different years (coded as YEAR vector with 21 distinct levels, and treated as a categorical variable). Unfortunately, every SITE was not sampled for each YEAR, with some being sampled for a greater number of years than others. The data is also quite sparse, in that I do not have a great number of measurements of the response variable (coded as COUNT and an integer vector) per SITE per YEAR.

My Poisson GLMM is constructed using the following code:

model <- glmer(data = mydata,
                  family = poisson(link = "log"),
                  formula = COUNT ~ SEX + SEX:MEDICATION + AGE + AGE:SEX + MEDICATION + AGE:MEDICATION + (1|SITE/YEAR), 
              offset = log(COUNT.SAMPLE.SIZE),
              nAGQ = 0)

In order to try and obtain more reliable estimates for the fixed effect coefficients (particularly given the sparse nature of my data), I am trying to obtain 95% confidence intervals for the fixed effect coefficients through non-parametric bootstrapping.

I have come across the "glmmboot" package which can be used to conduct non-parametric bootstrapping of GLMMs, however when I try to run the non-parametric bootstrapping using the following code:

library(glmmboot) 
bootstrap_model(base_model = model,
                  base_data = mydata,
                  resamples = 1000)

When I run this code, I receive the following message:

Performing case resampling (no random effects)

Naturally, though, my model does have random effects, namely (1|SITE/YEAR). If I try to tell the function to resample from a specific block, by adding in the "reample_specific_blocks" argument, i.e.:

library(glmmboot) 
bootstrap_model(base_model = model,
                  base_data = mydata,
                  resamples = 1000,
                  resample_specific_blocks = "YEAR")

Then I get the following error message:

Performing block resampling, over SITE Error: Invalid grouping factor specification, YEAR:SITE

I get a similar error message if I try set 'resample_specific_blocks' to "SITE". If I then try to set 'resample_specific_blocks' to "SITE:YEAR" or "SITE/YEAR" I get the following error message:

Error in bootstrap_model(base_model = model, base_data = mydata, resamples = 1000, : No random columns from formula found in resample_specific_blocks

I have tried explicitly nesting YEAR within SITE and then adapting the model accordingly using the code:

mydata <- within(mydata, SAMPLE <- factor(SITE:YEAR))
model.refit <- glmer(data = mydata,
                  family = poisson(link = "log"),
                  formula = COUNT ~ SEX + AGE + MEDICATION + AGE:MEDICATION + (1|SITE) + (1|SAMPLE), 
              offset = log(COUNT.SAMPLE.SIZE),
              nAGQ = 0)
bootstrap_model(base_model = model.refit,
                  base_data = mydata,
                  resamples = 1000,
                  resample_specific_blocks = "SAMPLE")

But unfortunately I just get this error message:

Error: Invalid grouping factor specification, SITE

The same error message comes up if I set resample_specific_blocks argument to SITE, or if I just remove the resample_specific_blocks argument.

I believe that the case_bootstrap() function found in the lmeresampler package could potentially be another option, but when I look into the help for it it looks like I would need to create a function and I unfortunately have no experience with creating my own functions within R.

If anyone has any suggestions on how I can get the bootstrap_model() function in the glmmboot package to recognise the random effects in my model/dataframe, or any suggestions for alternative methods on conducting non-parametric bootstrapping to create 95% confidence intervals for the coefficients of the fixed effects in my model, it would be greatly appreciated! Many thanks in advance, and for reading such a lengthy question!

For reference, I include links to the RDocumentation and GitHub for the glmmboot package: https://www.rdocumentation.org/packages/glmmboot/versions/0.6.0

https://github.com/ColmanHumphrey/glmmboot

The following is code that will allow for creation of a reproducible example using the data set from lme4::grouseticks

#Load in required packages
library(tidyverse)
library(lme4)
library(glmmboot)
library(psych)
#Load in the grouseticks dataframe
data("grouseticks")
tibble(grouseticks)
#Create dummy vectors for SEX, AGE and MEDICATION
set.seed(1)
SEX <-sample(1:2, size = 403, replace = TRUE)
SEX <- as.factor(ifelse(SEX == 1, "MALE", "FEMALE"))
set.seed(2)
AGE <- sample(1:2, size = 403, replace = TRUE)
AGE <- as.factor(ifelse(AGE == 1, "ADULT", "JUVENILE"))
set.seed(3)
MEDICATION <- sample(1:2, size = 403, replace = TRUE)
MEDICATION <- as.factor(ifelse(MEDICATION == 1, "OLD", "NEW"))
grouseticks$SEX <- SEX
grouseticks$AGE <- AGE
grouseticks$MEDICATION <- MEDICATION
#Use the INDEX vector to create a vector of sample sizes per LOCATION
#per YEAR
grouseticks$INDEX <- 1
sample.sizes <- grouseticks %>%
    group_by(LOCATION, YEAR) %>%
    summarise(SAMPLE.SIZE = sum(INDEX))
#Combine the dataframes together into the dataframe to be used in the
#model
mydata$SAMPLE.SIZE <- as.integer(mydata$SAMPLE.SIZE)
#Create the Poisson GLMM model
model <- glmer(data = mydata,
              family = poisson(link = "log"),
              formula = TICKS ~ SEX + SEX + AGE + MEDICATION + AGE:MEDICATION + (1|LOCATION/YEAR),
          nAGQ = 0)
#Attempt non-parametric bootstrapping on the model to get 95%
#confidence intervals for the coefficients of the fixed effects
set.seed(1)
Model.bootstrap <- bootstrap_model(base_model = model,
                  base_data = mydata,
                  resamples = 1000)
Model.bootstrap

It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. — MrFlick, Aug 04 '22 at 15:28
You could use the data set from `?lme4::grouseticks` to make a reproducible example that is similar to yours ... — Ben Bolker, Aug 04 '22 at 15:58
Hi @MrFlick, thank you very much for getting in touch. I have edited the question to include some code that gives a reproducible example at the end. — Angus, Aug 05 '22 at 09:49
Hi @BenBolker, thanks very much for getting in touch and for your suggestion. The `lme4::grouseticks` data set works very nicely here, as the TICKS vector has a Poisson distribution, which is the same as the response vector in my own data set. — Angus, Aug 05 '22 at 09:51
And it has a similar distribution of YEARS sampled per LOCATION in that not every LOCATION was sampled every YEAR, which again is similar to my own dataset. The SAMPLE.SIZE per LOCATION per YEAR which I create in the code above is also very similar to my own data set. — Angus, Aug 05 '22 at 10:00

Non-parametric bootstrapping to generate 95% Confidence Intervals for fixed effect coefficients calculated by a glmer with nested random effects

0 Answers0