4

I'm using the hierarchical modelling framework described by Kruschke to set up a comparison between two models in JAGS. The idea in this framework is to run and compare multiple versions of a model, by specifying each version as one level of a categorical variable. The posterior distribution of this categorical variable then can be interpreted as the relative probability of the various models.

In the code below, I'm comparing two models. The models are identical in form. Each has a single parameter that needs to be estimated, mE. As can be seen, the models differ in their priors. Both priors are distributed as beta distributions that have a mode of 0.5. However, the prior distribution for model 2 is a much more concentrated. Note also that I've used pseudo priors that I had hoped would keep the chains from getting stuck on one of the models. But the model seems to get stuck anyway.

Here is the model:

 model {

  m ~ dcat( mPriorProb[] )
  mPriorProb[1] <- .5
  mPriorProb[2] <- .5

  omegaM1[1] <- 0.5      #true prior
  omegaM1[2] <- 0.5      #psuedo prior 
  kappaM1[1] <- 3        #true prior for Model 1
  kappaM1[2] <- 5        #puedo prior for Model 1

  omegaM2[1] <- 0.5      #psuedo prior
  omegaM2[2] <- 0.5      #true prior
  kappaM2[1] <- 5        #puedo  prior for Model 2
  kappaM2[2] <- 10        #true prior for Model 2

  for ( s in 1:Nsubj ) {

    mE1[s] ~ dbeta(omegaM1[m]*(kappaM1[m]-2)+1 , (1-omegaM1[m])*(kappaM1[m]-2)+1 )
    mE2[s] ~ dbeta(omegaM2[m]*(kappaM2[m]-2)+1 , (1-omegaM2[m])*(kappaM2[m]-2)+1 )

    mE[s] <- equals(m,1)*mE1[s] + equals(m,2)*mE2[s]

    z[s] ~ dbin( mE[s] , N[s] )

  }
}

Here is R code for the relevant data:

dataList = list(
  z = c(81, 59, 36, 18, 28, 59, 63, 57, 42, 28, 47, 55, 38, 
        30, 22, 32, 31, 30, 32, 33, 32, 26, 13, 33, 30), 
  N = rep(96, 25),
  Nsubj = 25
)

When I run this model, the MCMC spends every single iteration at m = 1, and never jumps over to m = 2. I've tried lots of different combinations of priors and pseudo priors, and can't seem to find a combination in which the MCMC will consider m = 2. I've even tried specifying identical priors and pseudo priors for models 1 and 2, but this was no help. In this situation, I would expect the MCMC to jump fairly frequently between models, spending about half the time considering one model, and half the time considering the other. However, JAGS still spent the whole time at m = 1. I've run chains as long as 6000 iterations, which should be more than long enough for a simple model like this.

I would really appreciate if anyone has any thoughts on how to resolve this issue.

Cheers, Tim

jbaums
  • 27,115
  • 5
  • 79
  • 119
Tim
  • 69
  • 6
  • You should assume we're not familiar with the exact example by Kruschke, and so this question would be best asked with reference to a simplified example and the necessary data for us to test it at our end. – jbaums Sep 21 '15 at 08:41
  • Thanks for the response, and the advice. I've edited the question now, so hopefully it is clearer. – Tim Sep 21 '15 at 12:34
  • Might also be worth linking to the page in Google Books... looks like [this](https://books.google.com.au/books?id=FzvLAwAAQBAJ&pg=PA282) is it. – jbaums Sep 21 '15 at 22:00
  • I think you need to move `m` to the loop and subscript all your `m`s by `s`, e.g. m[i] ~ `m[s] ~ dcat(mPriorProb[]); mE1[s] ~ dbeta(omegaM1[m[s]]*(kappaM1[m[s]]-2)+1 , (1-omegaM1[m[s]])*(kappaM1[m[s]]-2)+1 ); etc`. That will allow the posterior probabilities of your two models to vary by subject (which seems to be what you're attempting by subscripting `mE1/2` by `s`). Alternatively, move `theta1`, `theta2`, and `theta` out of the loop and remove the `[s]` from those lines (to estimate a single probability for each model, common to all subjects). – jbaums Sep 22 '15 at 00:12
  • Sorry if I wasn't clear. I'd like to have a single 'm' estimated for all subjects - so I'm looking for a single probability for each model. However, I want the parameter 'mE' to vary by subject. I've got both of your suggestions to work, but I don't think either captures the structure I want. Moving the 'm' makes that parameter vary by subject. Taking mE1. mE2, and mE out of the loop fixes the 'mE' parameter to be equal across subjects. – Tim Sep 22 '15 at 02:42

3 Answers3

1

I haven't been able to figure this out, but I thought that anybody else who works on this might appreciate the following code, which will reproduce the problem start-to-finish from R with rjags (must have JAGS installed).

Note that since there are only two competing models in this example, I changed m ~ dcat() to m ~ dbern(), and then replaced m with m+1 everywhere else in the code. I hoped this might ameliorate the behavior, but it did not. Note also that if we specify the initial value for m, it stays stuck at that value regardless of which value we pick, so m just fails to get updated properly (instead of getting weirdly attracted to one model or the other). A head-scratcher for me; could be worth posting for Martyn's eyes at http://sourceforge.net/p/mcmc-jags/discussion/

library(rjags)
load.module('glm')

dataList = list(
  z = c(81, 59, 36, 18, 28, 59, 63, 57, 42, 28, 47, 55, 38, 
        30, 22, 32, 31, 30, 32, 33, 32, 26, 13, 33, 30), 
  N = rep(96, 25),
  Nsubj = 25
)

sink("mymodel.txt")
cat("model {

  m ~ dbern(.5)

  omegaM1[1] <- 0.5      #true prior
  omegaM1[2] <- 0.5      #psuedo prior 
  kappaM1[1] <- 3        #true prior for Model 1
  kappaM1[2] <- 5        #puedo prior for Model 1

  omegaM2[1] <- 0.5      #psuedo prior
  omegaM2[2] <- 0.5      #true prior
  kappaM2[1] <- 5        #puedo  prior for Model 2
  kappaM2[2] <- 10        #true prior for Model 2

    for ( s in 1:Nsubj ) {

    mE1[s] ~ dbeta(omegaM1[m+1]*(kappaM1[m+1]-2)+1 , (1-omegaM1[m+1])*(kappaM1[m+1]-2)+1 )
    mE2[s] ~ dbeta(omegaM2[m+1]*(kappaM2[m+1]-2)+1 , (1-omegaM2[m+1])*(kappaM2[m+1]-2)+1 )


    z[s] ~ dbin( (1-m)*mE1[s] + m*mE2[s] , N[s] )

    }
    }
    ", fill=TRUE)
sink()
inits <- function(){list(m=0)}

params <- c("m")

nc <- 1
n.adapt <-100
n.burn <- 200
n.iter <- 5000
thin <- 1
mymodel <- jags.model('mymodel.txt', data = dataList, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(mymodel, n.burn)
mymodel_samples <- coda.samples(mymodel,params,n.iter=n.iter, thin=thin)
summary(mymodel_samples)
Jacob Socolar
  • 1,172
  • 1
  • 9
  • 23
1

The trick is not assigning a fixed probability for the model, but rather estimating it (phi below) based on a uniform prior. You then want the posterior distribution for phi as that tells you the probability of selecting model 2 (ie, a "success" means m=1; Pr(model 1) = 1-phi).

sink("mymodel.txt")
cat("model {

  m ~ dbern(phi)
  phi ~ dunif(0,1)

  omegaM1[1] <- 0.5      #true prior
  omegaM1[2] <- 0.5      #psuedo prior 
  kappaM1[1] <- 3        #true prior for Model 1
  kappaM1[2] <- 5        #puedo prior for Model 1

  omegaM2[1] <- 0.5      #psuedo prior
  omegaM2[2] <- 0.5      #true prior
  kappaM2[1] <- 5        #puedo  prior for Model 2
  kappaM2[2] <- 10       #true prior for Model 2

  for ( s in 1:Nsubj ) {

    mE1[s] ~ dbeta(omegaM1[m+1]*(kappaM1[m+1]-2)+1 , (1-omegaM1[m+1])*(kappaM1[m+1]-2)+1 )
    mE2[s] ~ dbeta(omegaM2[m+1]*(kappaM2[m+1]-2)+1 , (1-omegaM2[m+1])*(kappaM2[m+1]-2)+1 )

    z[s] ~ dbin( (1-m)*mE1[s] + m*mE2[s] , N[s] )

    }
  }
", fill=TRUE)
sink()
inits <- function(){list(m=0)}

params <- c("phi")
Mark S
  • 603
  • 4
  • 9
  • It's interesting that this runs whereas specifying the bernoulli prior on m does not. However, note that in this example, we do NOT seek inference on phi--we still seek inference on m. See my new answer below. – Jacob Socolar Apr 18 '16 at 03:59
0

See my comment above on Mark S's answer.

This answer is to show by example why we want inference on m and not on phi.

Imagine we have a model given by

data <- c(-1, 0, 1, .5, .1)

m~dbern(phi)
data[i] ~ m*dnorm(0, 1) + (1-m)*dnorm(100, 1)

Now, it is obvious that the true value of m is 1. But what do we know about the true value of phi? Obviously higher values of phi are more likely, but we don't actually have good evidence to rule out lower values of phi. For example, phi=0.1 still has a 10% chance of yielding m=1; and phi=0.5 still has a 50% chance of yielding m=1. So we don't have good evidence against fairly low values of phi, even though we have ironclad evidence that m=1. We want inference on m.

Jacob Socolar
  • 1,172
  • 1
  • 9
  • 23