I would like to get the confidence interval (CI) for the predicted mean of a Linear Mixed Effect Model on a large dataset (~40k rows), which is itself a subset of an even larger dataset. This CI is then used for estimating the uncertainty of another calculation that uses the mean and its related CI as input data.
I managed to create a prediction estimate and interval for the full dataset, but a Prediction Interval is not the same and much larger than a CI. Beside bootstrapping (which takes way too much time with this much data), I cannot find a method that would allow me to estimate a CI – either because it is throwing errors or because it only offers to calculate Prediction intervals.
I quite recently moved into LME and I might therefore have overseen some obvious method.
Here is what I did so far in more detail:
The input data is confidential and I can therefore unfortunately not share any extract.
But in general, we have one dependent variable (y) representing the probability of a event and 2 categorical (c1 and c2) and two continuous variables (x1 and x2) with some weighting factor (w1). Some values in the dataset are missing. An extract of the first rows of the data could look like the example below:
c1 | c2 | x1 | x2 | w1 | y |
---|---|---|---|---|---|
London | small | 1 | 10 | NA | NA |
London | small | 1 | 20 | NA | NA |
London | large | 2 | 10 | 0.2 | 0.1 |
Paris | small | 1 | 10 | 0.2 | 0.23 |
Paris | large | 2 | 10 | 0.3 | 0.3 |
Based on this input data, I am then fitting a LMER model in the following form:
lmer1 <- lme4::lmer( y ~ x1 * poly(x2, 5) + ((x1 * poly(x2 ,5)) | c1),
data = df,
weights = w1,
control = lme4::lmerControl(check.conv.singular = lme4::.makeCC(action = "ignore", tol = 1e-3)))
This runs for some minutes and returns several warnings:
Warning messages: 1: In optwrap(optimizer, devfun, getStart(start, rho$pp), lower = rho$lower, : convergence code 5 from nloptwrap: NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached.
2: In checkConv(attr(opt, “derivs”), opt$par, ctrl = control$checkConv, : unable to evaluate scaled gradient
3: In checkConv(attr(opt, “derivs”), opt$par, ctrl = control$checkConv, : Model failed to converge: degenerate Hessian with 11 negative eigenvalues
I increased the MAXEVAL parameter but this still did not help to get rid of the warnings and I found that despite these warnings, the model is still fitted. I therefore started to apply different methods to get a prediction of the mean for the whole dataset and the related CI for the mean.
predictInterval
I started with creating a Prediction Interval for the full dataset:
predictions <- merTools::predictInterval(lmer1,
newdata = df,
which = "full",
n.sims = 1000,
include.resid.var = FALSE,
level=0.95,
stat="mean")
However, as stated above, the Prediction Interval is not the same as the CI (see also https://datascienceplus.com/prediction-interval-the-wider-sister-of-confidence-interval/).
I found that the general predict function has the option to set interval to either “prediction” or “confidence”, but this option does not exist with the prediction from a LMER object. And I could not find another possibility to switch from Prediction Interval to CI – even though I would believe that the data drawn should be sufficient to do this.
confint
I then saw that there is a function called “confint”, but when running this function I get the following error:
predicition_ci = lme4::confint.merMod(lmer1)
Computing profile confidence intervals ...
Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling detected new, lower deviance
In addition: Warning messages:
1: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, x@theta, lower = x@lower, calc.derivs = TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
I found this thread (Error when estimating CI for GLMM using confint()), which said that I need to reduce the “devtol” parameter by setting a different profile. But doing so results in the same error:
lmer1_devtol = profile(lmer1, devtol = 1e-7)
Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling detected new, lower deviance
In addition: Warning messages:
1: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, x@theta, lower = x@lower, calc.derivs = TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
add_ci
I found the function “add_ci” but this again resulted in another error:
predictions_ci = ciTools::add_ci(df, lmer1,
alpha = 0.05)
Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new levels detected in newdata
I then set the new “allow.new.levels” parameter to TRUE like in the description of the prediction function, but this parameter seems not to be carried through:
predictions_ci = ciTools::add_ci(df, lmer1,
alpha = 0.05,
allow.new.levels = TRUE)
Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new levels detected in newdata
Diag
I found a method to calculate CI intervals for the sleepstudy data, which uses a matrix conversion with diag.
Designmat <- model.matrix(as.formula("y ~ x1 * poly(x2, 5)")[-2], df)
predvar <- diag(Designmat %*% vcov(lmer1) %*% t(Designmat))
#With new data
newdat = df
newdat$pred <- predict(lmer1, newdat, allow.new.levels = TRUE)
Designmat <- model.matrix(formula(lmer1)[-2], newdat)
But the diag method does not work for such large datasets.
bootMer
As said earlier, the boostrapping of the confidence interval with bootMer is taking too much time for this subset of data (I started it 1 day ago and it is still running). I tried to use some parallel processing with the sleepstudy sample data but this could not increase the speed dramatically, so I would assume it will have the same effect on my large dataset.
merBoot <- bootMer(lmer1, predict, nsim = 1000, re.form = NA)
Others
I have read through all these post (and more), but none of them could help me to get the CI in reasonable time for my case. But maybe I have overseen something.
- https://stats.stackexchange.com/questions/344012/confidence-intervals-from-bootmer-in-r-and-pros-cons-of-different-interval-type
- https://stats.stackexchange.com/questions/117641/how-trustworthy-are-the-confidence-intervals-for-lmer-objects-through-effects-pa
- How to get coefficients and their confidence intervals in mixed effects models?
- Error when estimating CI for GLMM using confint()
- https://stats.stackexchange.com/questions/235018/r-extract-and-plot-confidence-intervals-from-a-lmer-object-using-ggplot
- How to get confidence intervals for lmer object?
- Confidence intervals for the predicted probabilities from glmer object, error with bootMer
- https://rdrr.io/cran/ciTools/man/add_ci.lmerMod.html
- Error when estimating Confidence interval in lme4
- https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/
- https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html
- https://drewtyre.rbind.io/classes/nres803/week_12/lab_12/