4

I wrote a PyMC model for fitting 3 Normals to data using (similar to the one in this question).

import numpy as np
import pymc as mc
import matplotlib.pyplot as plt

n = 3
ndata = 500

# simulated data
v = np.random.randint( 0, n, ndata)
data = (v==0)*(10+ 1*np.random.randn(ndata)) \
   + (v==1)*(-10 + 2*np.random.randn(ndata)) \
   + (v==2)*3*np.random.randn(ndata)

# the model
dd = mc.Dirichlet('dd', theta=(1,)*n)
category = mc.Categorical('category', p=dd, size=ndata)
precs = mc.Gamma('precs', alpha=0.1, beta=0.1, size=n)
means = mc.Normal('means', 0, 0.001, size=n)

@mc.deterministic
def mean(category=category, means=means):
    return means[category]

@mc.deterministic
def prec(category=category, precs=precs):
    return precs[category]

obs = mc.Normal('obs', mean, prec, value=data, observed = True)

model = mc.Model({'dd': dd,
              'category': category,
              'precs': precs,
              'means': means,
              'obs': obs})

M = mc.MAP(model)
M.fit()
# mcmc sampling
mcmc = mc.MCMC(model)
mcmc.use_step_method(mc.AdaptiveMetropolis, model.means)
mcmc.use_step_method(mc.AdaptiveMetropolis, model.precs)
mcmc.sample(100000,burn=0,thin=10)

tmeans = mcmc.trace('means').gettrace()
tsd = mcmc.trace('precs').gettrace()**-.5
plt.plot(tmeans)
#plt.errorbar(range(len(tmeans)), tmeans, yerr=tsd)
plt.show()

The distributions from which I sample my data are clearly overlapping, yet there are 3 well distinct peaks (see image below). Fitting 3 Normals to this kind of data should be trivial and I would expect it to produce the means I sample from (-10, 0, 10) in 99% of the MCMC runs. Data from 3 Normal distributions Example of an outcome I would expect. This happened in 2 out of 10 cases. MCMC trace producing a good fit Example of an unexpected result that happened in 6 out of 10 cases. This is weird because on -5, there is no peak in the data so I can't really a serious local minimum that the sampling can get stuck in (going from (-5,-5) to (-6,-4) should improve the fit, and so on). MCMC trace producing a bad fit

What could be the reason that (adaptive Metropolis) MCMC sampling gets stuck in the majority of cases? What would be possible ways to improve the sampling procedure that it doesn't?

So the runs do converge, but do not really explore the right range.


Update: Using different priors, I get the right convergence (appx. first picture) in 5/10 and the wrong one (appx. second picture) in the other 5/10. Basically, the lines changed are the ones below and removing the AdaptiveMetropolis step method:

precs = mc.Gamma('precs', alpha=2.5, beta=1, size=n)
means = mc.Normal('means', [-5, 0, 5], 0.0001, size=n)
Community
  • 1
  • 1
Michael Schubert
  • 2,726
  • 4
  • 27
  • 49
  • Can you provide the model code too, please? I think I know why this happens, but I'd like the full context too. – Cam.Davidson.Pilon Oct 01 '13 at 12:58
  • The model code is exactly the same as linked. – Michael Schubert Oct 01 '13 at 16:19
  • Ah, the url was hidden cc: @stackoverflow (I wish that worked) – Cam.Davidson.Pilon Oct 01 '13 at 19:01
  • Would it be possible for you to put your original code either in the question or in another pastie? I think your link contains updated code and I'm trying to reconstruct what you originally had based on comments and answer (maybe I should write a PyMC model to help me). – Ahmed Fasih Oct 07 '13 at 03:52
  • Done. The code posted shows the exact same behaviour as explained, a link to the updated code with different priors is at the end (which still only produces the right result in 50% of cases). – Michael Schubert Oct 07 '13 at 10:37
  • @user538603, are you comfortable enough with the math & Python to implement a Gibbs sampler for this model? The Gamma prior on the normal clusters' precisions will be conjugate and you can readily find the updates (e.g., wikipedia). This model is a good fit for Gibbs sampling (I did this once for homework) & it would be nice to know PyMC can handle it at least with a Gibbs sampler, if not an automatic built-in sampling/tuning scheme. I am having similar problems with PyMC & nonparametric Dirichlet process-based Gaussian mixture model/clustering: http://stats.stackexchange.com/q/72196/31187. – Ahmed Fasih Oct 11 '13 at 11:31

2 Answers2

3

Is there a particular reason you would like to use AdaptiveMetropolis? I imagine that vanilla MCMC wasn't working, and you got something like this:

enter image description here

Yea, that's no good. There are a few comments I can make. Below I used vanilla MCMC.

  1. Your means prior variance, 0.001, is too big. This corresponds to a std deviation of about 31 ( = 1/sqrt(0.001) ), which is too small. You are really forcing your means to be close to 0. You want a much larger std. deviation to help explore the area. I decreased the value to 0.00001 and got this:

enter image description here

Perfect. Of course, apriori I knew the true means were 50,0,and -50. Usually we don't know this, so it's always a good idea to set that value to be quite small.

2. Do you really think all the normals line up at 0, like your mean prior suggests? (You set the mean of all of them to 0) The point of this exercise is to find them to be different, so your priors should reflect that. Something like:

means = mc.Normal('means', [-5,0,5], 0.00001, size=n)

more accurately reflects your true belief. This actually also helps convergence by suggesting to the MCMC where the means should be. Of course, you'd have to use your best estimate to come up with these numbers (I've naively chosen -5,0,5 here).

Cam.Davidson.Pilon
  • 1,606
  • 1
  • 17
  • 31
  • Even with the changed prior it explodes 9 out of 10 times (note: I made a mistake, the data means were [-10,0,10]; but the result is the same for the 50s). The exact code I now use is linked here: http://pastebin.com/qauDYXM8 – Michael Schubert Oct 02 '13 at 10:15
0

The problem is caused by a low acceptance rate for the category variable. See the answer I gave to a similar question.

Community
  • 1
  • 1
Tom Minka
  • 794
  • 6
  • 10