1

I'm teaching myself PyMC but got stuck with the following problem:

I have a model whose parameters should be determined from successive measurements. In the beginning the parameter's prior is uninformative, but should be updated after each measurement (i.e. replaced by the posterior). In short, I want to do sequential updating with PyMC.

Consider the following (somewhat constructed) example:

  • Measurement 1: 10 questions, 9 correct answers
  • Measurement 2: 5 questions, 3 correct answers

Of course, this can be solved analytically with beta/binomial conjugate priors, but this is not the point here :)

Alternatively, both measurements could be combined to n=15 and k=12. However, this is too simple. I want to take the hard way for educational purposes.

I found a solution in this answer, where new priors are sampled from the posterior. This is almost what I want, but sampling the prior feels a bit messy because the results depends on the number of samples and other settings.

My attempted solution puts both measurement and priors separately in one model, like this:

n1, k1 = 10, 9
n2, k2 = 5, 3

theta1 = pymc.Beta('theta', alpha=1, beta=1)
outcome1 = pymc.Binomial('outcome1', n=n1, p=theta1, value=k1, observed=True)

theta2 = ?  # should be the posterior of theta1
outcome2 = pymc.Binomial('outcome2', n=n2, p=theta2, value=k2, observed=True)

How can I get the posterior of theta1 as the prior of theta2? Is this even possible, or did I just demonstrate ultimate ignorance about Bayesian statistics?

Community
  • 1
  • 1
MB-F
  • 22,770
  • 4
  • 61
  • 116

1 Answers1

0

The only way sequential updating works sensibly is in two different models. Specifying them in the same model does not make any sense, since we have no posteriors until after MCMC has completed.

In principle, you would examine the distribution of theta1 and specify a prior that best resembles it. In this simple case it is easy -- it would be:

theta2 = pymc.Beta('theta2', alpha=10, beta=2)

since you don't need MCMC to determine what the posterior of theta is. More generally, you could fit a Beta distribution to the posterior, say using scipy.stats.beta.fit.

Chris Fonnesbeck
  • 4,143
  • 4
  • 29
  • 30
  • Thank you for answering. If the model becomes more complicated I may no longer know what the actual posterior distribution is. In this case I would have to use kernel density estimation, or are there better options? – MB-F Dec 15 '14 at 15:24
  • In this case the posteriors are easy, because the model is conjugate. If you have enough data, you can assume your posteriors are multivariate normal, and just parameterize that based on the outputs of the previous model. – Chris Fonnesbeck Dec 15 '14 at 17:16
  • By enough data - do you mean enough samples drawn from the posterior or enough observed data? – MB-F Dec 15 '14 at 18:55