0

Variations on this question has been asked before (e.g. Is there a way of getting "marginal effects" from a `glmer` object), and most of them suggest using ggeffects (or sjPlot). However, a statistician at our faculty is having some trouble agreeing with the output from ggeffects.

I'm new to both statistics and R (so I probably shouldn't be working with glmer...), and I'm having some trouble understanding what I'm doing wrong.

My model in its simplest form is a glmer from lme4: outcome(binary) ~ FE (binary) + (1|RE). The fixed effect is a test performed on some, but not all the individuals from my random effect.

Model output

Intercept: 1.2654

FE_2: -0.2305

RE Std. Dev.: 2.896

ggpredict:

ggpredict(model, type = "fe", terms = "FE")

1: 0.780

2: 0.738

Now, as far as I've understood I can get the marginal effect in probability for test 2 like this:

y <- 1.2654 - 0.2305
prob <- exp(y) / (1 + exp(y))

Which is exactly the same as the output from ggpredict: 0.738.

However, he's saying that this is the conditional probability for each individual, and that I need to "insert something I don't understand" to get the probability I can generalise to a population. His quick example of a solution:

y <- 1.2654 - 0.2305 + rnorm(100000000) * 2.896 
prob_trus <- exp(y) / (1 + exp(y))
mean(prob_trus)
0.62

Using ggaverage with the same arguments as the previous ggpredict almost gives me the same probability, 0.639. So, finally for my questions:

  • Is ggaverage the same as his solution just with some simulation difference and/or rounding error?
  • If his method is the way to go, how do I get the same results from ggeffects?
stapperen
  • 137
  • 1
  • 9

2 Answers2

0

ggaverage() is not the same as the "something I don't understand"-method, the similar results are due to chance. You can read something about the differences between ggpredict() and ggaverage() here in the vignette.

I would not say that the random effect variances, the effects your colleague also wants the predictions to be conditioned on, should be added to the predicted values to calculate the predicted probabilities. Rather, this uncertainty should be accounted for in the standard error of the prediction (and hence, the resulting confidence intervals).

You will see the difference when comparing ggpredict(model, type = "fe", terms = "FE") to ggpredict(model, type = "re", terms = "FE"). Notice the changed type = "re", which now takes the random effects into account, leading to different predicted values and larger confidence intervals.

Here's an example with the example-data from the package:

data(efc_test)

fit <- glmer(
  negc7d ~ c12hour + e42dep + c161sex + c172code + (1 | grp),
    data = efc_test,
    family = binomial(link = "logit")
  )

ggpredict(fit, "c12hour", type = "fe")

#> # Predicted probabilities for Negative impact with 7 items 
#> # x = average number of hours of care per week 
#> 
#>   x predicted conf.low conf.high
#>   0     0.342    0.249     0.449
#>   5     0.344    0.251     0.450
#>  10     0.346    0.254     0.452
#>  15     0.348    0.256     0.453
#>  20     0.350    0.258     0.455
#>  25     0.352    0.260     0.456
#>  30     0.354    0.261     0.458
#>  35     0.356    0.263     0.460
#>  40     0.357    0.265     0.463
#>  45     0.359    0.266     0.465
#>  ... and 25 more rows.
#> 
#> Adjusted for:
#> *   e42dep = 2.92
#> *  c161sex = 1.76
#> * c172code = 1.97

ggpredict(fit, "c12hour", type = "re")

#> # Predicted probabilities for Negative impact with 7 items 
#> # x = average number of hours of care per week 
#> 
#>   x predicted conf.low conf.high
#>   0     0.472    0.107     0.870
#>   5     0.475    0.108     0.871
#>  10     0.477    0.109     0.872
#>  15     0.479    0.110     0.873
#>  20     0.481    0.111     0.873
#>  25     0.483    0.111     0.874
#>  30     0.485    0.112     0.875
#>  35     0.487    0.113     0.876
#>  40     0.489    0.114     0.877
#>  45     0.491    0.115     0.878
#>  ... and 25 more rows.
#> 
#> Adjusted for:
#> *   e42dep = 2.92
#> *  c161sex = 1.76
#> * c172code = 1.97
Daniel
  • 7,252
  • 6
  • 26
  • 38
  • Thank you, this does at least make more intuitive sense. It's starting to dawn on me that there is not just one "correct" answer to this question. I've also worked through this (https://stats.idre.ucla.edu/r/dae/mixed-effects-logistic-regression/) example with my own data and get results similar to ggaverage. Using ggaverage, though, won't give me confidence intervals for both outcomes. Is this a mistake on my part, related to the data, or intentional? – stapperen Aug 28 '18 at 06:18
  • As `ggaverage()` uses the data from the model, you may get multiple, different predicted values for one point at the x-axis - that's why I "smooth" these predictions, however, there's no standard error for this "smoothing" that represents the standard error of the _predicted values_. That's why no CI are shown. I would rarely use this function, but rather use `ggeffect()` or `ggpredict()`. – Daniel Aug 28 '18 at 08:26
0

Indeed, the coefficients you obtain from a mixed model with a nonlinear link function (e.g., a mixed effects logistic regression) have, in general, an interpretation conditional on the random effects.

Hedeker et al. (2018) have recently proposed a new idea for obtaining the regression coefficients with a marginal/population interpretation. This is implemented in function marginal_coefs() of the R package GLMMadaptive that fits mixed models using adaptive Gaussian quadrature. A generic example of a mixed effects logistic regression is:

library("GLMMadaptive")
fm <- mixed_model(fixed = y ~ x1 + x2 + x3, random = ~ x1 | id, data = DF, family = binomial())

marginal_coefs(fm)
marginal_coefs(fm, std_errors = TRUE)

For additional examples, e.g., how to obtain marginal predictions or produce effects plots, check the vignette: https://drizopoulos.github.io/GLMMadaptive/articles/Methods_MixMod.html

  • This looks great! I've done a few quick tests and I'm getting approximately the same results from GLMMadaptive and lme4. I've got one question (probably more to follow): what is the scale on the output from marginal_coefs()? In the vignette it says "The function calculates the marginal probabilities in our case", but the output for the coefs are negative. Shouldn't they be between 0-1 if its calculating probabilities, or is this just another misinterpretation on my part? – stapperen Aug 28 '18 at 08:08
  • The coefficients returned by `marginal_coefs()` are on the same scale as the fixed effects coefficients, they just have a different interpretation (i.e., they have a marginal/population interpretation). Hence, in the case of mixed effects logistic regression, they are the log odds ratios. – Dimitris Rizopoulos Aug 28 '18 at 13:38