R package DHARMa detects deviations from normality even if a mixed model is fitted to gaussian-generated data

Question

[Main edits in italics]

Using R I generated data from normal distributions but, when I fitted a random-intercept mixed-effect model to them, both the Kolmogorov-Smirnoff test for normality and the Quantile-Quantile plot performed by the R package DHARMa indicated deviations from normality. Why is that?

Test 1

I generated a set of random-intercept data in which both the number of replicates and the within-group standard deviation varies for each level of the random effect:

rm(list=ls())
library(DHARMa)
library(glmmTMB)
library(ggplot2)

# "regression" with an 8-level RE with variable slopes:
set.seed(666)
ii=8 # parameter set size; no. groups
reps <- sample(x=8:12, size=ii, replace=T)
slope <- 3
intercept <- runif(ii, min = 70, max = 140)
group.sd <- runif(ii, min = 5, max = 15)
group.ID <- LETTERS[1:ii]

xx <- 1:15

out.list <- list() # empty list to store simulated data

for (i in 1:ii){
        df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
                                    x=rep(xx, reps[i]))
        df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
        out.list[[i]] = df00
        }

# to turn out.list into a data.frame:
df <- do.call(rbind.data.frame, out.list)

# data visualisation
ggplot(df, aes(x = x, y = y, colour = Group)) + 
    geom_point() + geom_smooth(method="lm") + theme_classic()

If I fit a random-intercept model to my dataset:

glmmTMB_ri <- glmmTMB(y~x + (1|Group), data=df)

These are the diagnostic plots and tests produced by DHARMa:

mem1Resids <- simulateResiduals(glmmTMB_ri, n = length(df[, 1]) * 3)
par(mfrow = c(1, 2))
plotQQunif(mem1Resids)
mtext(text = "(a)", side = 3, adj = 0, line = 2)
plotResiduals(mem1Resids, quantreg = T)
mtext(text = "(b)", side = 3, adj = 0, line = 2)

Test 2

I thought that the issue could that the resolution of the simulated residuals was too low. Increasing the number of simulated observations by simulateResiduals() from three times the size of my dataset to 10 times the size of my dataset doesn't improve the diagnostic plots:

Test 3

I thought the issue might not be with DHARMa's settings but with my dataset (perhaps too small or too variable). I generated a new dataset with larger, uniform replication levels for each random group (40 observations for each value of x for each group) and with lower, uniform whithin-group variability (SD=5 for all random groups):

reps <- rep(40, length.out=ii)
# let's also reduce and uniformise maximum within-group variability:
group.sd <- rep(5, ii)

out.list <- list() # empty list to store simulated data

for (i in 1:ii){
        df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
                                    x=rep(xx, reps[i]))
        df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
        out.list[[i]] = df00
        }

# to turn out.list into a data.frame:
df2 <- do.call(rbind.data.frame, out.list)

# data visualisation
ggplot(df2, aes(x = x, y = y, colour = Group)) +
    geom_point() + geom_smooth(method="lm") + theme_classic()

If I fit a random-intercept model to my new dataset:

glmmTMB_ri2 <- glmmTMB(y~x + (1|Group), data=df2)

These are the diagnostic plots and tests produced by DHARMa:

mem2Resids <- simulateResiduals(glmmTMB_ri2, n = length(df2[, 1]) * 3)
par(mfrow = c(1, 2))
plotQQunif(mem2Resids)
mtext(text = "(a)", side = 3, adj = 0, line = 2)
plotResiduals(mem2Resids, quantreg = T)
mtext(text = "(b)", side = 3, adj = 0, line = 2)

If anything, the diagnostic plots show a worse situation than they did before.

Why does DHARMa indicate non-normality of my models' residual's distributions even if I generated my data to be gaussian? If the problem isn't with the sample size or with DHARMa, perhaps I simulated my data incorrectly, but that doesn't seem to be the case either.

Florian Hartig · Accepted Answer · 2023-06-20T07:22:37.150

3

Disclaimer: I'm the developer of DHARMa.

First of all, for DHARMa-specific questions, please also consider using our issue tracker https://github.com/florianhartig/DHARMa/issues

Second, regarding your question / results: when you simulate with difference SDs for the REs, you violate the assumptions of the mixed models, which assumes REs come from a normal distribution with a common standard deviation. EDIT: plus, it seems to me you are drawing the random intercepts from a uniform distribution, whereas a standard mixed model assumes normal.

Thus, I'm not sure: why are you concerned? I would say it's rather encouraging that this problem is highlighted by DHARMa. If your question is about the specific reason why this is highlighted - the DHARMa default is to re-simulate unconditionally, i.e. with REs. If you would switch to simulations conditional on the fitted REs, which corresponds closer to standard residual checkes, the problem would likely be less visible.

Finally, note that DHARMa has a function createData() that allows you to create data according to the assumptions of a mixed model.

edited Jun 20 '23 at 07:22

answered Jun 16 '23 at 08:44

Florian Hartig

196
5

Many thanks! If I understand correctly, your answer explains the scenario I observed in my **"Test 1"** simulation: I simulated a random effect with heterogeneous within-group variance. The answer provided by @user12256545 explains the scenario I observed in my **"Test 2"** simulation: larger N in `simulateResiduals()` led to higher power for the Kolmogorov-Smirnoff (KS) test. I still don't understand why the KS test is significant and the Q-Q plot is so skewed in my **"Test 3"** simulation, in which the standard deviations were all the same (SD=5) for each level of the random effect. – Marco Plebani Jun 16 '23 at 15:57
ok, after reading your code again, it seems to me that in all cases, you have a random intercept that is actually drawn from a normal distribution? You never use the RE sd afaiks? – Florian Hartig Jun 18 '23 at 18:37
I simulated observations as y=intercept + slope * x + epsilon with group-specific intercept <- runif(ii, min = 70, max = 140). Epsilon had group-specific SD in ***Test 1*** and ***Test 2***, while I fixed it to SD=5 for all observations in ***Test 3***. – Marco Plebani Jun 19 '23 at 06:53
(where ii is the ii-th group) – Marco Plebani Jun 19 '23 at 07:08
1

Sorry, typo - you are drawing the random intercepts from a uniform distribution, that's what I meant to write. This is not the assumption of a mixed effects model – Florian Hartig Jun 19 '23 at 10:21

score 2 · Answer 2 · answered Jun 20 '23 at 14:37

To add another perspective, using performance::check_model() (which as @FlorianHartig points out would correspond to using conditional rather than unconditional simulation in DHARMa) on the first model gives:

The two deviations of your simulation from mixed-model assumptions (uniform distribution of REs and group-specific SDs) are indicated by:

flatter marginal distribution relative to posterior predictions (top left plot) (both?)
slightly decreasing variance with mean (middle left) (both?)
fat-tailed Q-Q plot (middle/bottom right) (group SDs?)
slightly deviant distribution of RE values (bottom left) (uniform REs?)

Looking at your second set of simulations (homogeneous SDs by group)

As far as the conditional performance is concerned, this now looks pretty good (heteroscedasticity and fat tails of residuals are gone)

score 1 · Answer 3 · answered Jun 13 '23 at 20:56

This is not an error in the implementation of DHARMa but a statistical issue regarding normality testing.

The results of your Kolmogorov-Smirnoff test are what you would expect from it. It looks if the data deviates from H0: the data is normally distributed.

Since you simulate these values with a mean 0 and use a standard deviation between 5 and 15 group.sd <- runif(ii, min = 5, max = 15), if it goes "bad" you have a sd ~15. Further, the more samples you generate using rnorm you will obtain a higher absolute number of points that not fall under the normal distribution, even if the probability and percentage of non-normal points stay the same, ramping up the power of the ks-test (larger N -> more test power).

You can read more on this at:

https://stats.stackexchange.com/questions/333892/is-the-kolmogorov-smirnov-test-too-strict-if-the-sample-size-is-large

And check out suggested alternatives discussed here:

https://stats.stackexchange.com/questions/465196/kolmogorov-smirnov-test-statistic-interpretation-with-large-samples

Hi! *larger N -> more test power* is a good point, but the issue isn't only with the Kolmogorov-Smirnoff test. The Q-Q plots look skewed too. I'm not sure if mixed-effect models assume levels of the random effect to have the same variance ***(do they?)***, but in my third test I simulated a new dataset in which all levels of the random effect had the same replication level and the same standard deviation and the Q-Q plot got worse (see ***Test 3***). — Marco Plebani, Jun 15 '23 at 08:36

R package DHARMa detects deviations from normality even if a mixed model is fitted to gaussian-generated data

3 Answers3