2

I normally work with lme4 package, but the glmmTMB package is increasingly becoming better suited to work with highly complicated data (think overdispersion and/or zero-inflation).

Is there a way to extract posterior modes and credible intervals from glmmTMB models, similar to how it is done for lme4 models (example here).

Details:

I am working with count data (available here) that are zero-inflated and overdispersed and have random effects. The package best suited to work with this sort of data is the glmmTMB (details here). (Note two outliers: euc0==78 and np_other_grass==20).

The data looks like this:

euc0 ea_grass ep_grass np_grass np_other_grass month year precip season   prop_id quad
 3      5.7      0.0     16.7            4.0     7 2006    526 Winter    Barlow    1
 0      6.7      0.0     28.3            0.0     7 2006    525 Winter    Barlow    2
 0      2.3      0.0      3.3            0.0     7 2006    524 Winter    Barlow    3
 0      1.7      0.0     13.3            0.0     7 2006    845 Winter    Blaber    4
 0      5.7      0.0     45.0            0.0     7 2006    817 Winter    Blaber    5
 0     11.7      1.7     46.7            0.0     7 2006    607 Winter    DClark    3

The glmmTMB model:

model<-glmmTMB(euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + (1|prop_id), data = euc, family= nbinom2) #nbimom2 lets var increases quadratically
summary(model)
confint(model) #this gives the confidence intervals

How I would normally extract the posterior mode and credible intervals for a lmer/glmer model:

#extracting model estimates and credible intervals
sm.model <-arm::sim(model, n.sim=1000)
smfixef.model = sm.model@fixef
smfixef.model =coda::as.mcmc(smfixef.model)
MCMCglmm::posterior.mode(smfixef.model)  #mode of the distribution
coda::HPDinterval(smfixef.model)  #credible intervals

#among-brood variance
bid<-sm.model@ranef$prop_id[,,1]
bvar<-as.vector(apply(bid, 1, var)) #between brood variance posterior distribution
bvar<-coda::as.mcmc(bvar)
MCMCglmm::posterior.mode(bvar) #mode of the distribution
coda::HPDinterval(bvar) #credible intervals
Blundering Ecologist
  • 1,199
  • 2
  • 14
  • 38

2 Answers2

2

Most of an answer:

  1. Getting a multivariate Normal sample of the parameters of the conditional model is pretty easy (I think this is what arm::sim() is doing.
library(MASS)
pp <- fixef(model)$cond
vv <- vcov(model)$cond
samp <- MASS::mvrnorm(1000, mu=pp, Sigma=vv)

(then use the rest of your method above).

  1. I'm a little skeptical that your second example is doing what you want it to do. The variance of the conditional modes is not necessarily a good estimate of the between-group variance (e.g. see here). Furthermore, I'm nervous about the half-assed-Bayesian approach (e.g., why no priors? Why look at the posterior mode, which is rarely a meaningful value in a Bayesian context?) although I do sometimes use similar approaches myself!) However, it's not too hard to use glmmTMB results to do a proper Markov chain Monte Carlo analysis:
library(tmbstan)
library(rstan)
library(coda)
library(emdbook) ## for lump.mcmc.list(), or use runjags::combine.mcmc()

t2 <- system.time(m2 <- tmbstan(model$obj))
m3 <- rstan::As.mcmc.list(m2)
lattice::xyplot(m3,layout=c(5,6))
m4 <- emdbook::lump.mcmc.list(m3)
coda::HPDinterval(m4)

It may be helpful to know that the theta column of m4 is the log of the among-group standard standard deviation ...

(See vignette("mcmc", package="glmmTMB") for a little bit more information ...)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    Just for the record, samples of parameters can be easily obtained (with literally the same underlying code) from [`parameters::simulate_model()`](https://easystats.github.io/parameters/reference/simulate_model.html) resp. for the summary [`parameters::simulate_parameters()`](https://easystats.github.io/parameters/reference/simulate_parameters.html) (including nice [`plot()`-methods](https://easystats.github.io/see/articles/parameters.html)). – Daniel May 26 '20 at 15:38
  • I spent some time looking at the vignette you linked to. I still don't fully understand the output for your `t2`, `m3`, and `m4`. Would it be possible to annotate the code a bit to help me understand what each function is doing/showing? – Blundering Ecologist May 27 '20 at 16:45
1

I think Ben has already answered your question, so my answer does not add much to the discussion... Maybe just one thing, as you wrote in your comments that you're interested in the within- and between-group variances. You can get these information via parameters::random_parameters() (if I did not misunderstand what you were looking for). See example below that first generates simulated samples from a multivariate normal (just like in Ben's example), and later gives you a summary of the random effect variances...

library(readr)
library(glmmTMB)
library(parameters)
library(bayestestR)
library(insight)

euc_data <- read_csv("D:/Downloads/euc_data.csv")
model <-
  glmmTMB(
    euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + (1 | prop_id),
    data = euc_data,
    family = nbinom2
  ) #nbimom2 lets var increases quadratically


# generate samples
samples <- parameters::simulate_model(model)
#> Model has no zero-inflation component. Simulating from conditional parameters.


# describe samples
bayestestR::describe_posterior(samples)
#> # Description of Posterior Distributions
#> 
#> Parameter      | Median |           89% CI |    pd |        89% ROPE | % in ROPE
#> --------------------------------------------------------------------------------
#> (Intercept)    | -1.072 | [-2.183, -0.057] | 0.944 | [-0.100, 0.100] |     1.122
#> ea_grass       | -0.001 | [-0.033,  0.029] | 0.525 | [-0.100, 0.100] |   100.000
#> ep_grass       | -0.050 | [-0.130,  0.038] | 0.839 | [-0.100, 0.100] |    85.297
#> np_grass       | -0.020 | [-0.054,  0.012] | 0.836 | [-0.100, 0.100] |   100.000
#> np_other_grass | -0.002 | [-0.362,  0.320] | 0.501 | [-0.100, 0.100] |    38.945


# or directly get summary of sample description
sp <- parameters::simulate_parameters(model, ci = .95, ci_method = "hdi", test = c("pd", "p_map"))
sp
#> Model has no zero-inflation component. Simulating from conditional parameters.
#> # Description of Posterior Distributions
#> 
#> Parameter      | Coefficient | p_MAP |    pd |              CI
#> --------------------------------------------------------------
#> (Intercept)    |      -1.037 | 0.281 | 0.933 | [-2.305, 0.282]
#> ea_grass       |      -0.001 | 0.973 | 0.511 | [-0.042, 0.037]
#> ep_grass       |      -0.054 | 0.553 | 0.842 | [-0.160, 0.047]
#> np_grass       |      -0.019 | 0.621 | 0.802 | [-0.057, 0.023]
#> np_other_grass |       0.019 | 0.999 | 0.540 | [-0.386, 0.450]

plot(sp) + see::theme_modern()
#> Model has no zero-inflation component. Simulating from conditional parameters.

# random effect variances
parameters::random_parameters(model)
#> # Random Effects
#> 
#> Within-Group Variance         2.92 (1.71)
#> Between-Group Variance
#>   Random Intercept (prop_id)   2.1 (1.45)
#> N (groups per factor)
#>   prop_id                       18
#> Observations                   346

insight::get_variance(model)
#> Warning: mu of 0.2 is too close to zero, estimate of random effect variances may be unreliable.
#> $var.fixed
#> [1] 0.3056285
#> 
#> $var.random
#> [1] 2.104233
#> 
#> $var.residual
#> [1] 2.91602
#> 
#> $var.distribution
#> [1] 2.91602
#> 
#> $var.dispersion
#> [1] 0
#> 
#> $var.intercept
#>  prop_id 
#> 2.104233

Created on 2020-05-26 by the reprex package (v0.3.0)

Daniel
  • 7,252
  • 6
  • 26
  • 38
  • I think I must be missing something. When I get to the `parameters::random_parameters(model)` line, I get an error: 'random_parameters' is not an exported object from 'namespace:parameters'. Is there a package I should be loading beforehand? Everything else runs well. – Blundering Ecologist May 27 '20 at 16:38
  • Also, some points of clarification. Is the `bayestestR::describe_posterior()` output analogous to the way I was extracting posterior modes for the fixed effects and credible intervals? What is the difference between this and the output from `parameters::simulate_parameters()`? – Blundering Ecologist May 27 '20 at 16:38
  • Sorry for all the questions... What object is called `sp` for your plot? – Blundering Ecologist May 27 '20 at 16:41
  • 1
    Regarding your first comment, I think you just need to update the parameters-package from CRAN (must be version 0.7.0). `bayestestR::describe_posterior()` just takes a Bayesian model, or a data frame (this is what you get from `arm::sim()` or `parameters::simulate_model()`, which literally does something very similar like arm::sin) and gives you some information on the distribution of the (posterior) samples. However, `bayestestR::describe_posterior()` does no extract the samples from a frequentist model, the samples you get from arm::sim or parameters::simulate_model. – Daniel May 28 '20 at 15:31
  • 1
    `parameters::simulate_parameters()` is a kind of combination of parameters::simulate_model (or arm::sim) and bayestestR::describe_posterior. It first calls internally simulate_model() to get the samples, and then calls describe_posterior() to give you the "summary information". Regarding `sp`: sorry, I accidentally deleted two lines of code, will edit the example. – Daniel May 28 '20 at 15:33
  • Thank you for those points of clarification. One additional follow-up: so, while the output for `bayestestR::describe_posterior()` and `parameters::simulate_parameters()` are similar, `parameters::simulate_parameters()` extracts samples from a frequentist model? – Blundering Ecologist May 28 '20 at 16:19
  • > parameters::simulate_parameters() extracts samples from a frequentist model? Yes, by calling simulate_model(), which [uses this code](https://github.com/easystats/parameters/blob/1d0ca173d0ae242e96728a67bfb209d89dae3f85/R/simulate_model.R#L441) to generate a multivariate normal sample of the parameters (you'll see that the code is fairly similar to what Ben posted, just that it's more generic and works for many other models, not only lme4/glmmTMB, as well). – Daniel May 28 '20 at 18:44