[Main edits in italics]
Using R I generated data from normal distributions but, when I fitted a random-intercept mixed-effect model to them, both the Kolmogorov-Smirnoff test for normality and the Quantile-Quantile plot performed by the R
package DHARMa
indicated deviations from normality. Why is that?
Test 1
I generated a set of random-intercept data in which both the number of replicates and the within-group standard deviation varies for each level of the random effect:
rm(list=ls())
library(DHARMa)
library(glmmTMB)
library(ggplot2)
# "regression" with an 8-level RE with variable slopes:
set.seed(666)
ii=8 # parameter set size; no. groups
reps <- sample(x=8:12, size=ii, replace=T)
slope <- 3
intercept <- runif(ii, min = 70, max = 140)
group.sd <- runif(ii, min = 5, max = 15)
group.ID <- LETTERS[1:ii]
xx <- 1:15
out.list <- list() # empty list to store simulated data
for (i in 1:ii){
df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
x=rep(xx, reps[i]))
df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
out.list[[i]] = df00
}
# to turn out.list into a data.frame:
df <- do.call(rbind.data.frame, out.list)
# data visualisation
ggplot(df, aes(x = x, y = y, colour = Group)) +
geom_point() + geom_smooth(method="lm") + theme_classic()
If I fit a random-intercept model to my dataset:
glmmTMB_ri <- glmmTMB(y~x + (1|Group), data=df)
These are the diagnostic plots and tests produced by DHARMa
:
mem1Resids <- simulateResiduals(glmmTMB_ri, n = length(df[, 1]) * 3)
par(mfrow = c(1, 2))
plotQQunif(mem1Resids)
mtext(text = "(a)", side = 3, adj = 0, line = 2)
plotResiduals(mem1Resids, quantreg = T)
mtext(text = "(b)", side = 3, adj = 0, line = 2)
Test 2
I thought that the issue could that the resolution of the simulated residuals was too low. Increasing the number of simulated observations by simulateResiduals()
from three times the size of my dataset to 10 times the size of my dataset doesn't improve the diagnostic plots:
Test 3
I thought the issue might not be with DHARMa
's settings but with my dataset (perhaps too small or too variable). I generated a new dataset with larger, uniform replication levels for each random group (40 observations for each value of x for each group) and with lower, uniform whithin-group variability (SD=5 for all random groups):
reps <- rep(40, length.out=ii)
# let's also reduce and uniformise maximum within-group variability:
group.sd <- rep(5, ii)
out.list <- list() # empty list to store simulated data
for (i in 1:ii){
df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
x=rep(xx, reps[i]))
df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
out.list[[i]] = df00
}
# to turn out.list into a data.frame:
df2 <- do.call(rbind.data.frame, out.list)
# data visualisation
ggplot(df2, aes(x = x, y = y, colour = Group)) +
geom_point() + geom_smooth(method="lm") + theme_classic()
If I fit a random-intercept model to my new dataset:
glmmTMB_ri2 <- glmmTMB(y~x + (1|Group), data=df2)
These are the diagnostic plots and tests produced by DHARMa
:
mem2Resids <- simulateResiduals(glmmTMB_ri2, n = length(df2[, 1]) * 3)
par(mfrow = c(1, 2))
plotQQunif(mem2Resids)
mtext(text = "(a)", side = 3, adj = 0, line = 2)
plotResiduals(mem2Resids, quantreg = T)
mtext(text = "(b)", side = 3, adj = 0, line = 2)
If anything, the diagnostic plots show a worse situation than they did before.
Why does DHARMa indicate non-normality of my models' residual's distributions even if I generated my data to be gaussian? If the problem isn't with the sample size or with DHARMa, perhaps I simulated my data incorrectly, but that doesn't seem to be the case either.