3

I fitted a glmmTMB model using family = nbinom1. Now I would like to perform a simulation of data based on predicted values and the dispersion. However, from the help files, it looks like the go-to rnbinom function uses the family=nbinom2 parameterization where variance is equal to mu + mu^2/size.

1) Can anyone help me figure out how to simulate family=nbinom1 data (where variance is equal to mu + mu*size)?

2) Also, is my extraction / use of the dispersion value as size correct?

Thanks so much!

Current code (data not provided, because doesn't matter), using the stats:::rnbinom function despite the mismatch of variance definition:

library(glmmTMB)

mod <- glmmTMB(y ~ x + (1 | ID), data = df, family = nbinom1)
preds <- predict(mod, type = "response")
size <- sigma(mod)
sim <- rnbinom(nrow(df), mu = preds, size = size)
user2602640
  • 640
  • 6
  • 21
  • Can attempt a detailed answer later, you can easily work out the variance first using nbinom1 formula, calculate the size, and pass it back to rnbinom. size can be a vector as long as mu – StupidWolf May 10 '20 at 21:35
  • sigma(mod) is ok. – StupidWolf May 10 '20 at 21:39
  • True, good point. If you write up the answer, I'll accept it. – user2602640 May 10 '20 at 23:04
  • 2
    if you just want to simulate from the observed predictors/fitted model there's a `simulate()` method for the fitted object (also good as a cross-check on any simulation you come up with) – Ben Bolker May 10 '20 at 23:34

2 Answers2

1

We can try to simulate nbinom1, so if the variance is mu + mu*k:

set.seed(111)
k = 2
x = runif(100,min=1,max=3)
y = rnbinom(100,mu=exp(2*x),size=exp(2*x)/k)
ID = sample(1:2,100,replace=TRUE)
df = data.frame(x,y,ID)
mod <- glmmTMB(y ~ x + (1 | ID), data = df, family = nbinom1)

sigma(mod)
[1] 1.750076

In the above, for every mean, mu, I specified a size that is mu / k so that it will give an expected variance of mu*k. This shows that as long as you parameterize the rnbinom correctly, you get back rnbinom1.

Now with this model, if we need to simulate data, it's just using the same parameterization as above:

preds <- predict(mod, type = "response")
size <- sigma(mod)
sim <- rnbinom(nrow(df), mu = preds, size = preds/size)

plot(sim,df$y)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
1

There are a variety of issues here, including:

  • sigma(mod) gives the estimated standard deviation of the residuals; it is not a variance but the square-root of a variance, so you might want to square it.
  • there are many parametrisations of a negative binomial distribution beyond R's version, but in R's version, if the mean is mean(dat) and the variance var(dat) then you can estimate size with mean(dat)^2/(var(dat)-mean(dat)) and the probability prob with mean(dat)/var(dat)
  • rnbinom() will tolerate size being non-integer or infinite despite this being a theoretical nonsense; it will not tolerate size being negative which can happen if var(dat) is less than mean(dat). It will also have problems the mean is negative or if size is zero.

So perhaps you could consider adapting your simulation lines to something like

sizes <- ifelse(sigma(mod) ^ 2 > preds, preds ^ 2 / (sigma(mod) ^ 2 - preds), Inf)
sim <- ifelse(preds > 0, rnbinom(nrow(df), mu = preds, size = sizes), 0) 

then you might still get errors when sigma(mod) is less than or equal to preds

Henry
  • 6,704
  • 2
  • 23
  • 39
  • actually according to `?sigma.glmmTMB` "nbinom1 returns an overdispersion parameter (usually denoted alpha as in Hardin and Hilbe (2007)): such that the variance equals mu(1+alpha). – Ben Bolker May 10 '20 at 23:40
  • @BenBolker - you may well be correct: I had looked at the basic `?sigma` – Henry May 10 '20 at 23:48