I am pretty new to R and am having some trouble finding a straightforward solution to overdispersion in a GLMM with binomial distribution. I have a few different questions listed here. I am mostly finding information that does not consider a random effect or tries to correct for it in a way that I don't understand (and likely am not doing properly). Any help and guidance to resources that clearly outline a course of action would be greatly appreciated.
A little bit about my data - I have treatments with different proportions of infected insects in a population and am interested in if a behavioral response differs amongst treatments. The behavior response variable is binary, with 1 being the presence of a behavior and 0 being the absence of a behavior for each individual sampled (N = 276). My data has 106 '0' values and 170' 1 values in total. So I am trying to used a mixed effects model to see if the probability of the behavior of interest is changing with treatment, using mesocosm in which the individuals were sampled from as a random effect.
> df
Treatment Mesocosm Behavior
1 20 3 1
2 20 3 1
3 20 3 1
4 20 3 1
5 20 4 1
6 20 4 1
7 20 4 1
8 20 4 1
9 20 5 1
10 20 5 1
11 20 5 1
12 20 5 0
13 40 6 1
14 40 6 1
15 40 6 1
16 40 6 1
17 40 6 1
18 40 6 0
19 40 6 0
20 40 6 0
- First, I would like to make sure my method for calcualting overdispersion is correct.
model <- glmer(Behavior ~ Treatment + (1|Mesocosm), data=df, family=binomial(link = 'logit'))
overdispersion.ratio = deviance(model)/df.residual(model)
I got a ratio of 1.3 - which is high enough that I think I need to correct for it in my model.
- A common solution I am finding is to change the distribution from binomial to quasibinomial. Yet I am getting an error saying I cannot use a quasi family with a mixed effects model and am reading that people are having similar issues online.
Error in lme4::glFormula(formula = Behavior ~ Treatment + (1|Mesocosm), data = df, :
"quasi" families cannot be used in glmer
Some people seem to be working around with the glm.nb
function - but I am a bit unsure of what it is doing to my data so am a bit weary to just go ahead and use it.
- Can someone explain to me how to find the dispersion parameter (theta) that can be put in the model to correct for overdispersion? I am having trouble understanding the formulas I'm seeing. If someone could more clearly define the terms that would be very appreciated.
I'd appreciate any guidance on solving this problem and some patience as I am far from a statistician and am new to this type of analysis. Thanks!