3

In a previous question (Generate a predicted count distribution from a ZINB model of class glmmTMB) I asked how to generate a predicted count distribution for a zero-inflated negative binomial model of class "glmmTMB". One solution that I have since found to that question is the function simulate.glmmTMB (https://www.rdocumentation.org/packages/glmmTMB/versions/0.2.3/topics/simulate.glmmTMB). However, I want to do the simulation on test data to validate a model for predictive ability and I only see how to run simulations on the same data used to fit the model.

In the example below, how could I simulate outcomes for the newdata data frame?

library(glmmTMB)
data("bioChemists", package = "pscl")
zinb <- glmmTMB(art ~ fem + mar + kid5 + phd + ment, ziformula = ~ ., data = 
bioChemists, family = nbinom2(link = "log"))
sim_1 <- simulate(zinb) #works as expected

#make new dataframe
newdata = unique(bioChemists[,c("fem","mar","kid5","phd","ment")])
sim_2 <- simulate(zinb, newdata = newdata) #ignores newdata
Andrew Bade
  • 333
  • 3
  • 10
  • 1
    this probably has to be done by hand ... – Ben Bolker Mar 18 '19 at 21:27
  • Thanks for the quick response, @Ben Bolker. Would that entail something along the lines of this answer (https://stackoverflow.com/a/14967981/6365720)? I've tried to extrapolate that solution to this case, but had trouble generalizing it to a much more complicated model structure. I'm happy to put the time in to do this by hand but frankly do not know where to begin. Thanks again. – Andrew Bade Mar 18 '19 at 21:42

1 Answers1

4

I think this works (could be encapsulated in a function etc.):

n <- nrow(newdata)
## construct model matrices for conditional and Z-I
##  components (.[-2] drops the LHS of the formula)
X <- model.matrix(formula(zinb)[-2],newdata)
X_zi <- model.matrix(formula(zinb,component="cond")[-2],newdata)
## extract coefficients
beta <- fixef(zinb)$cond
beta_zi <- fixef(zinb)$zi
## draw random values for each component
cond <- rnbinom(n, mu=exp(X %*% beta), size=sigma(zinb))
zi <- rbinom(1, prob=plogis(X_zi %*% beta_zi), size=1)
cond*zi

The last step is a little too clever: ifelse(zi==0,0,cond) might be clearer, or for the last three steps you could use the rzinbinom function in the emdbook package ...

In general I think simulate() methods should allow both newdata and newparams - it opens up a big range of possibilities for parametric bootstrapping, posterior simulation, etc. etc..


More compactly (and probably more robustly):

cond <- predict(zinb, newdata=newdata, type="conditional")
zi <- predict(zinb, newdata=newdata, type="zprob")
emdbook::rzinbinom(nrow(newdata),
                   mu=cond, size=sigma(zinb),
                   zprob=zi)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you for taking the time to walk me through this - I really appreciate it. It is also vindicating to hear you agree that `simulate()` should allow `newdata`. – Andrew Bade Mar 19 '19 at 19:42
  • @benbolker are you aware of a method for this but when you have also have predictors on the dispersion parameter? So the original formula might be something like `glmmTMB(art ~ fem + mar + kid5 + phd + ment, dispformula = ~ fem, ...)`. I can't find a way to extract predictions for the dispersion parameter from the generated model. – Will T-E Jul 18 '19 at 19:10
  • The only way I know to make these predictions is the hard way, i.e. `f <- formula(model, component="disp"); X <- model.matrix(f, data=newdata); beta <- fixef(model)[["disp"]]; eta <- X %*% beta; sd_pred <- exp(eta)` – Ben Bolker Jul 18 '19 at 19:37
  • @BenBolker how might you implement this solution for an nbinom1 model estimated in glmmTMB? I'm assuming that the fact that `simulate.glmmTMB()` provides counts for an NB-1 model suggests that some sort of `rnbinom1()`-like function somewhere, but all the simulation pawned off to TMB and C functions in that package, I can't figure out where. – Joe Mar 20 '20 at 20:47