most efficient way of using categorical independent variables in a JAGS bayesian model

Question

I am trying to run a logistic regression model in JAGS, with a binomial response (rather than Bernoulli process) and wish to model month as a categorical variable. I can easily model month as a continuous model:


# Priors
beta0 ~ dunif(-20, 20)
beta1 ~ dunif(-10, 10)

# Likelihood:
for (i in 1:N){
   r400[i] ~ dbin(prop[i], n[i])          
   logit(prop[i]) <- logit.prop[i]            
   logit.prop[i] <- beta0 + beta1 * month[i]  
   }
}

After supplying data, initial values, parameters and iteration values, the model runs successfully with good model parameter convergence, with very similar estimates to the frequentist model.

I am trying to model month as a factor. I first created a series of 11 dummy variables (there are no missing month records), feb, mar, apr, ….., dec. (January is represented as well, but acts as the reference category). This gave the following model:


# Priors
beta0 ~ dunif(-50, 50)
beta1 ~ dunif(-15, 15)
beta2 ~ dunif(-15, 15)
beta3 ~ dunif(-15, 15)
beta4 ~ dunif(-5, 25)
beta5 ~ dunif(-5, 25)
beta6 ~ dunif(-5, 25)
beta7 ~ dunif(-5, 25)
beta8 ~ dunif(-5, 25)
beta9 ~ dunif(-5, 25)
beta10 ~ dunif(-15, 15)
beta11 ~ dunif(-15, 15)

# Likelihood: 
for (i in 1:N){
   r400[i] ~ dbin(prop[i], n[i])          
   logit(prop[i]) <- logit.prop[i]            
   logit.prop[i] <- beta0 + beta1 * feb[i] + beta2 * mar[i] +
                    beta3 * apr[i] + beta4 * may[i] + beta5 * jun[i] + 
                    beta6 * jul[i] + beta7 * aug[i] + beta8 * sep[i] +
                    beta9 * oct[i] + beta10 * nov[i] + beta11 * dec[i] 
   }
}

And this model converges. However, I was wondering if there was a more elegant way to do this over a series of beta1 parameters. I'm unsure as how to specify the likelihood across the multiple beta1 parameters.

However, I tried using the index system, but this didn't work.

To reply to user20650, all 12 months are represented. I’ve also tried with i=1:12, and ensure that priors are covered by same amount. Giving an example I have restricted the simulated data below to just the six spring/summer months as the remainder of the year produces sparse events. And Fernando is absolutely correct, the data forms a time series, so I wish to treat site as a random effect, but as a proof of concept, I was hoping to get the fixed effects model working first.

Here is simulated data for 10 sites, across ten years and for 6 months:

year <- rep(rep(c(2011:2020), 10), each=6)
month <- rep(c(5:10), 100)
n <- rpois(600, 5)+1
r400 <- rbinom(600, n, 0.1)
may <- ifelse(month==5, 1, 0)
jun <- ifelse(month==6, 1, 0)
jul <- ifelse(month==7, 1, 0)
aug <- ifelse(month==8, 1, 0)
sep <- ifelse(month==9, 1, 0)
oct <- ifelse(month==10, 1, 0)
dfx <- cbind.data.frame(site, year, month, n, r400, may, jun, jul, aug, sep, oct)
summary(dfx)


sink("mod.jags")
cat("
model {

# Priors
beta0 ~ dunif(-20, 20)
for(i in 1:5){
  beta1[i] ~ dunif(-20, 20)
}


# Likelihood: 
for (i in 1:N){
     r400[i] ~ dbin(prop[month[i]], n[month[i]])
     logit(prop[i]) <- logit.prop[i]           
     logit.prop[i] <- beta0 + beta1[month[i]] * month[i]  
  }
   
}
",fill = TRUE)
sink()

# Bundle data
win.data <- list(r400 = dfx$r400, 
                 N = length(dfx$r400), 
                 month = dfx$month, 
                 n = dfx$n)

# Initial values
inits <- function() list(beta0 = runif(1, -20, 20), 
                         beta1 = runif(5, -20, 20)) 

# Parameters monitored
params <- c("beta0", "beta1") 

# MCMC settings
ni <- 1000
nt <- 10
nb <- 500
nc <- 3

# Call JAGS from R 
out <- jags(win.data, inits, params, "mod.jags", 
            n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb)

With the same error message as before:

  RUNTIME ERROR:
Compilation error on line 15.
Index out of range taking subset of  beta1

I get the same error message if I use i:1-6 for beta1 and the inits.

Can somebody give advice on what’s going wrong with the model set up.

Thanks, Mark

In your last example; is it possible that your months are coded 2-12 (as Jan isn't there). But yourr beta1 indices are from 1 to 11. Therefore, when month is 12 then bete1[month[i]] = beta1[12] would give n index error? Otherwise can you share a small reproducible example of full code, and data. — user20650, Sep 08 '22 at 15:21
The code looks ok to me. Maybe the supplied data is causing the problem? Another thing, the subjects are independent? I ask this, because usually when treating with monthly (time series) data, you take more than 1 observation on the same subject, leading to autocorrelation. If this is the case, maybe you should add a random intercept or making a mixed effects model. — Fernando, Sep 08 '22 at 15:36
Hi user20650 and Fernando, I've edited my post showing simulated data and my full code. I hope this helps. Mark — Mark, Sep 08 '22 at 16:24
@user20650 is correct. `month` takes values from `5` to `11`. But `beta1` is indexed from `1` to `5`. Hence `beta1[month[i]]` fails. Try `beta1[month[i]-5]` and check for similar errors elsewhere. — Limey, Sep 08 '22 at 16:41
Thanks for the update@Mark. Quick comments. 1) I think it would be easier to pass the model matrix of `month` as data e.g. use `win.data <- list(..., month = model.matrix(~factor(dfx$month) - 1), ...)` (where the `...` are the other terms), 2) the `beta` loop index should be 6 and not 5 (it wouls be best to pass this as a parameter rather than hard coding), 3) you can use either of [these](https://stackoverflow.com/questions/63457703/error-attempt-to-redefine-node-in-linear-regression/63458966#63458966) to multiply — user20650, Sep 08 '22 at 20:59
... so copyinh from code form link the likelihood could be `temp <- month %*% beta1 ; for (i in 1:N){ r400[i] ~ dbin(prop[i], n[i]) ; logit(prop[i]) <- beta0 + temp[i] }` or `for (i in 1:N){ r400[i] ~ dbin(prop[i], n[i]) ; logit(prop[i]) <- beta0 + inprod(beta1, month[i,]); }` — user20650, Sep 08 '22 at 21:00
Hi user20650, Thanks for the suggested code. The model is now (slowly) running, so that's great. It's curious that the reparameterisation of the model has led to much slower convergence. Thanks again, — Mark, Sep 09 '22 at 11:06
@Mark ; the nested indexing in bugs is fast but the indexing can get a bit complicated once random effects are included (usually by nested indexing) -- but you could certainly use it by changing the values that month takes re Limey's comment above. I'd also think that the speed of multiplication in the code I provided will depend on what blas library that you use. — user20650, Sep 09 '22 at 13:32
... oh I also load the glm module e.g. `load.module("glm")` (see [this](https://stackoverflow.com/questions/52901838/weird-output-of-poisson-glm-with-an-iid-random-effect-in-r/52936131#52936131) example) — user20650, Sep 09 '22 at 13:35
Thanks user20650. What exactly does the load.module() function do? It seems to slow down the MCMC process (by a lot), as if instead it was running a lot more iterations. — Mark, Sep 14 '22 at 13:54
it loads a different set of samples for liner/generalized linear models -- it often can help with convergence — user20650, Sep 14 '22 at 22:22

most efficient way of using categorical independent variables in a JAGS bayesian model

0 Answers0