Confidence Interval of the predicted mean of a LMER object for large dataset

Question

I would like to get the confidence interval (CI) for the predicted mean of a Linear Mixed Effect Model on a large dataset (~40k rows), which is itself a subset of an even larger dataset. This CI is then used for estimating the uncertainty of another calculation that uses the mean and its related CI as input data.

I managed to create a prediction estimate and interval for the full dataset, but a Prediction Interval is not the same and much larger than a CI. Beside bootstrapping (which takes way too much time with this much data), I cannot find a method that would allow me to estimate a CI – either because it is throwing errors or because it only offers to calculate Prediction intervals.

I quite recently moved into LME and I might therefore have overseen some obvious method.

Here is what I did so far in more detail:

The input data is confidential and I can therefore unfortunately not share any extract.

But in general, we have one dependent variable (y) representing the probability of a event and 2 categorical (c1 and c2) and two continuous variables (x1 and x2) with some weighting factor (w1). Some values in the dataset are missing. An extract of the first rows of the data could look like the example below:

c1	c2	x1	x2	w1	y
London	small	1	10	NA	NA
London	small	1	20	NA	NA
London	large	2	10	0.2	0.1
Paris	small	1	10	0.2	0.23
Paris	large	2	10	0.3	0.3

Based on this input data, I am then fitting a LMER model in the following form:

lmer1 <- lme4::lmer( y ~ x1 * poly(x2, 5) + ((x1 * poly(x2 ,5)) | c1),
                            data = df,
                            weights = w1,
                            control = lme4::lmerControl(check.conv.singular = lme4::.makeCC(action = "ignore", tol = 1e-3)))

This runs for some minutes and returns several warnings:

Warning messages: 1: In optwrap(optimizer, devfun, getStart(start, rho$pp), lower = rho$lower, : convergence code 5 from nloptwrap: NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached.

2: In checkConv(attr(opt, “derivs”), opt$par, ctrl = control$checkConv, : unable to evaluate scaled gradient

3: In checkConv(attr(opt, “derivs”), opt$par, ctrl = control$checkConv, : Model failed to converge: degenerate Hessian with 11 negative eigenvalues

I increased the MAXEVAL parameter but this still did not help to get rid of the warnings and I found that despite these warnings, the model is still fitted. I therefore started to apply different methods to get a prediction of the mean for the whole dataset and the related CI for the mean.

predictInterval

I started with creating a Prediction Interval for the full dataset:

predictions <- merTools::predictInterval(lmer1,
                                       newdata = df,
                                       which = "full",
                                       n.sims = 1000,
                                       include.resid.var = FALSE,
                                       level=0.95,
                                       stat="mean")

However, as stated above, the Prediction Interval is not the same as the CI (see also https://datascienceplus.com/prediction-interval-the-wider-sister-of-confidence-interval/).

I found that the general predict function has the option to set interval to either “prediction” or “confidence”, but this option does not exist with the prediction from a LMER object. And I could not find another possibility to switch from Prediction Interval to CI – even though I would believe that the data drawn should be sufficient to do this.

confint

I then saw that there is a function called “confint”, but when running this function I get the following error:

predicition_ci = lme4::confint.merMod(lmer1)

Computing profile confidence intervals ...

Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling detected new, lower deviance

In addition: Warning messages:

1: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended.

2: In optwrap(optimizer, devfun, x@theta, lower = x@lower, calc.derivs = TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded

I found this thread (Error when estimating CI for GLMM using confint()), which said that I need to reduce the “devtol” parameter by setting a different profile. But doing so results in the same error:

lmer1_devtol = profile(lmer1, devtol = 1e-7)

Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling detected new, lower deviance

In addition: Warning messages:

1: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended.

2: In optwrap(optimizer, devfun, x@theta, lower = x@lower, calc.derivs = TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded

add_ci

I found the function “add_ci” but this again resulted in another error:

predictions_ci = ciTools::add_ci(df, lmer1,
                                      alpha = 0.05)

Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new levels detected in newdata

I then set the new “allow.new.levels” parameter to TRUE like in the description of the prediction function, but this parameter seems not to be carried through:

predictions_ci = ciTools::add_ci(df, lmer1,
                                      alpha = 0.05,
                                      allow.new.levels = TRUE)

Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new levels detected in newdata

Diag

I found a method to calculate CI intervals for the sleepstudy data, which uses a matrix conversion with diag.

Designmat <- model.matrix(as.formula("y ~ x1 * poly(x2, 5)")[-2], df)
   predvar <- diag(Designmat %*% vcov(lmer1) %*% t(Designmat)) 
    
#With new data
newdat = df
newdat$pred <- predict(lmer1, newdat, allow.new.levels = TRUE)
    
 Designmat <- model.matrix(formula(lmer1)[-2], newdat)

But the diag method does not work for such large datasets.

bootMer

As said earlier, the boostrapping of the confidence interval with bootMer is taking too much time for this subset of data (I started it 1 day ago and it is still running). I tried to use some parallel processing with the sleepstudy sample data but this could not increase the speed dramatically, so I would assume it will have the same effect on my large dataset.

merBoot <- bootMer(lmer1, predict, nsim = 1000, re.form = NA)

Others

I have read through all these post (and more), but none of them could help me to get the CI in reasonable time for my case. But maybe I have overseen something.

DrJerryTAO · Answer 1 · 2022-12-01T07:48:43.383

Unsurprising to me but unfortunate for you, nonconvergence of mixed model estimation and difficulty in generating confidence intervals results from the misuse of a linear model for data with a limited dependent variable. "Despite these warnings, the model is still fitted" is a dangerous practice, as iterations are not to be used from predictions if not converged. As you described, the dependent variable (y) represents the probability of an event, which is a continuous variable between zero and one. Using a linear model to predict probability constitutes a linear probability regression, which requires censoring predicted outcomes (e.g. forcing all predicted values greater than .99 to be .99 while forcing all predicted values smaller than .01 to be .01) and adjusting for heterogenous variances using weighted least squares (see https://bookdown.org/ccolonescu/RPoE4/heteroskedasticity.html). Having continuous variables produce both fixed and random effects also burden the convergence, while some or all the random effects of continuous variables may not be necessary. The use of weights can be also problematic.

Instead of a linear probability regression, beta regression works best for dependent variables which are proportions and probabilities. Beta regression without random effects is done in betareg::betareg(). glmmTMB::glmmTMB() handles beta regression with random effects. Start from a simple setting where only the intercept has random effects such as

glmmTMB(y ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), family = list(family = "beta", link ="logit"), data = df)

You may compare the result with glmer() and lmer()

glmer(y ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), family = gaussian(link = "logit"), data = df)

lmer(log(y/(1-y)) ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), data = df)

glmer() and lmer() with the above specifications are equivalent, and both assume that predicting log(y/(1-y)) has normal residuals, while glmmTMB() assumes that y follows a gamma distribution. lmer() results are easier to explain and receive wider support from other packages, since they are linear models. On the other hand, glmmTMB() may fit better according to AIC, BIC, and log likelihood. Note that all three requires y strictly in (0, 1) noninclusive. To include occasional zeros and ones, manipulate observations at both boundaries by introducing a small tolerance usually equal to half of the smaller distance from a boundary to its closest observed value (see https://stats.stackexchange.com/questions/109702 and https://graphworkflow.com/eda/bounded01/). For probabilities with either or both of many zeros and ones, zero-, one-, and zero-one–inflated beta regression is fitted via gamlss::gamlss(). See Korosteleva, O. (2019). Advanced regression models with SAS and R. CRC Press.

Add random effects of slopes if necessary according to likelihood ratio tests. Make sure there are enough levels in c1 (e.g. more than 10 different cities) to necessitate mixed effect models. The {glmmTMB} package extends glm() and glmer(). Its alternative {brms} package is built for Bayesian approach. Note that the weights = argument in glmmTMB() as in glm() specifies that values in weights are inversely proportional to the dispersions and are not automatically scaled to sum to one unless integer values which specifies number of observation units. Therefore, you need to investigate what w1 stands for and evaluate how to use it in modeling.

merTools::predictInterval() generates many kinds of intervals for mixed models, some comparable to confidence intervals and prediction intervals in linear models without random effects. However, it supports lmer() model objects only. See https://cran.r-project.org/web/packages/merTools/vignettes/merToolsIntro.html and https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html. predictInterval(lmer(), include.resid.var = F) includes uncertainty from both fixed and random effects of all coefficients including the intercept but excludes variation from multiple measurements of the same group or individual. This can be considered similar to prediction intervals of linear models without random effects. predictInterval(lmer(), include.resid.var = F, fix.intercept.variance = T) generates shorter CI than above by accounting for covariance between the fixed and random effects of the intercept. predictInterval(lmer(), include.resid.var = F, ignore.fixed.terms = "(Intercept)") also shortens CI by removing uncertainty from the fixed effect of the intercept. If there are no random slopes other than random intercept, the last two methods are comparable to confidence intervals of of linear models without random effects. confint(lmear()) and confint(profile(lmear())) generates confidence intervals of modal parameters such as a slope, so they do not produce confidence intervals of predicted outcomes.

You may also find the following functions and packages useful for generating CIs of mixed effect models. ggeffect() {ggeffects} predictions() {marginaleffects} and margins() prediction() {margins} {predictions} They can produce predictions averaged over observed distribution of covariates, instead of making predictions by holding some predictors at specific values such as means or modes which can be misleading and not useful.

Confidence Interval of the predicted mean of a LMER object for large dataset

1 Answers1