3

Imagine a underlying process that draws a number from a normal distribution with probability $\alpha$ and from a uniform distribution with probability $1 - \alpha$. The observed sequence of numbers generated by this process therefore follows a distribution $f$ that is a mixture of 2 components and mixing weights of $\alpha$ and $1 - \alpha$. How would you model this kind of mixture with JAGS when the observed sequence is the only input, but the parametric families are known?

Example (in R):

set.seed(8361299)
N <- 100
alpha <- 0.3
mu <- 5
max <- 50
# Which component to choose from?
latent_class <- rbinom(N, 1, alpha)
Y <- ifelse(latent_class, runif(N, min=mu, max=max), rnorm(N, mean=mu))

The generated (observed) Y looks like: Generated Y

With JAGS, it should be possible to obtain the mixing weights, as well as the parameters of the known components?

jenzopr
  • 70
  • 1
  • 9

1 Answers1

3

Mixture models of the same parametric distribution are pretty straightforward in JAGS/BUGS, but mixture models with varying parametric responses (like yours) are a little more tricky. One method is to use the 'ones trick' whereby we manually calculate the likelihood of the response (selecting one of the two distributions as specified by the latent part of the model) and fit this to the (fake) response of a Bernoulli trial for each data point. For example:

# Your data generation:
set.seed(8361299)
N <- 100
alpha <- 0.3
mu <- 5
max <- 50
# Which component to choose from?
latent_class <- rbinom(N, 1, alpha)
Y <- ifelse(latent_class, runif(N, min=mu, max=max), rnorm(N, mean=mu))

# The model:
model <- "model{

    for(i in 1:N){

        # Log density for the normal part:
        ld_norm[i] <- logdensity.norm(Y[i], mu, tau)

        # Log density for the uniform part:
        ld_unif[i] <- logdensity.unif(Y[i], lower, upper)

        # Select one of these two densities:
        density[i] <- exp(ld_norm[i]*norm_chosen[i] + ld_unif[i]*(1-norm_chosen[i]))

        # Generate a likelihood for the MCMC sampler:
        Ones[i] ~ dbern(density[i])

        # The latent class part as usual:
        norm_chosen[i] ~ dbern(prob)

    }

    # Priors:
    lower ~ dnorm(0, 10^-6)
    upper ~ dnorm(0, 10^-6)
    prob ~ dbeta(1,1)
    mu ~ dnorm(0, 10^-6)
    tau ~ dgamma(0.01, 0.01)

    # Specify monitors, data and initial values using runjags:
    #monitor# lower, upper, prob, mu, tau
    #data# N, Y, Ones
    #inits# lower, upper
}"

# Run the model using runjags (or use rjags if you prefer!)
library('runjags')

lower <- min(Y)-10
upper <- max(Y)+10

Ones <- rep(1,N)

results <- run.jags(model, sample=20000, thin=1)
results

plot(results)

This seems to recover your parameters pretty well (your alpha is 1-prob), but watch out for autocorrelation (and convergence).

Matt


EDIT: Since you asked about generalising to more than 2 distributions, here is equivalent (but more generalisable) code:

# The model:
model <- "model{
    for(i in 1:N){
        # Log density for the normal part:
        ld_comp[i, 1] <- logdensity.norm(Y[i], mu, tau)
        # Log density for the uniform part:
        ld_comp[i, 2] <- logdensity.unif(Y[i], lower, upper)
        # Select one of these two densities and normalise with a Constant:
        density[i] <- exp(ld_comp[i, component_chosen[i]] - Constant)
        # Generate a likelihood for the MCMC sampler:
        Ones[i] ~ dbern(density[i])
        # The latent class part using dcat:
        component_chosen[i] ~ dcat(probs)
    }
    # Priors for 2 parameters using a dirichlet distribution:
    probs ~ ddirch(c(1,1))
    lower ~ dnorm(0, 10^-6)
    upper ~ dnorm(0, 10^-6)
    mu ~ dnorm(0, 10^-6)
    tau ~ dgamma(0.01, 0.01)
    # Specify monitors, data and initial values using runjags:
    #monitor# lower, upper, probs, mu, tau
    #data# N, Y, Ones, Constant
    #inits# lower, upper, mu, tau
}"

library('runjags')

# Initial values to get the chains started:
lower <- min(Y)-10
upper <- max(Y)+10
mu <- 0
tau <- 0.01

Ones <- rep(1,N)

# The constant needs to be big enough to avoid any densities >1 but also small enough to calculate probabilities for observations of 1:
Constant <- 10

results <- run.jags(model, sample=10000, thin=1)
results

This code will work for as many distributions as you need, but expect exponentially worse autocorrelation with more distributions.

Matt Denwood
  • 2,537
  • 1
  • 12
  • 16
  • Great! AC looks fine and the ESS is larger than 10000 for mu and prob! Previously, I was hinted to [choosing-different-distributions-based-on-if-else-condition-in-winbugs-jags](http://stackoverflow.com/questions/15414303/choosing-different-distributions-based-on-if-else-condition-in-winbugs-jags). In the accepted answer, Y was sampled from both distributions and one was selected based on an indicator variable. But this code fails for me with a syntax error. What do you think about the proposed solution? – jenzopr Apr 14 '16 at 12:37
  • 1
    That answer seems (at best) incomplete to me - it contains invalid syntax for a start.... In any case that approach won't work for you because the functional form of your response is different - the other question has dcat for everything, but you have dunif and dnorm. A standard approach for the same functional response would be to index the parameter, e.g. Y[i] ~ dnorm(mus[norm_chosen[i]], taus[norm_chosen[i]]) would work for your example if you wanted a mixture of normals (but then again, there is dnormmix for that!). – Matt Denwood Apr 14 '16 at 15:05
  • Thanks a lot for the explanation! Going back to your answer, I'm currently trying to make the transition from two finite components to n finite components. I was thinking about using dcat instead of dbern to generate the latent classes: `e[i] <- dcat(pi[1:2]) for(k in 1:2) { norm_chosen[i,k] <- (k == e[i]) }` And use `density[i] <- exp(ld_norm[i]*norm_chosen[i,1] + ld_unif[i]*norm_chosen[i,2])` instead. However, the model fails at some point with Node inconsistent with parents at node Ones[40]. I fear that the density can get out of bounds, when pi lacks a proper prior? – jenzopr Apr 15 '16 at 07:49
  • 1
    Yes, you can generalise that solution to n components using dcat instead of dbern and ddirch instead of dbeta. I would do something like: density[i] <- exp(ld_component[i, component_chosen[i]) and component_chosen[i] ~ dcat(probs) and probs ~ ddirch(c(1,1,1)) (where the length of the parameter to ddirch reflects the number of components). You also need to make sure that density[i] is always less than 1 - sometimes this means subtracting a constant from all density values. – Matt Denwood Apr 15 '16 at 08:02
  • The JAGS 4.0.0 [NEWS](https://sourceforge.net/p/mcmc-jags/code-0/ci/default/tree/NEWS) state that the logdensity.X functions return a "normalized log density". Doesn't this mean that the value is between 0 and 1? Thanks a lot for your time!! – jenzopr Apr 15 '16 at 09:12
  • 1
    Not necessarily - continuous functions may still have a normalised log density over 0, because it is a probability density and not a probability mass. For a silly example: m <- 'model{\n ld <- logdensity.norm(0, 0, 1000) \n }'; runjags::run.jags(m, monitor='ld') gives me around 2.5 (due to the high precision). See: http://math.stackexchange.com/questions/105455/how-can-a-probability-density-be-greater-than-one-and-integrate-to-one – Matt Denwood Apr 15 '16 at 09:24