0

I have tried to use emcee library to implement Monte Carlo Markov Chain inside a class and also make multiprocessing module works but after running such a test code:

import numpy as np
import emcee
import scipy.optimize as op
# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 0.534

# Generate some synthetic data from the model.
N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)

class modelfit():
      def  __init__(self):
          self.x=x
          self.y=y
          self.yerr=yerr
          self.m=-0.6
          self.b=2.0
          self.f=0.9
      def get_results(self):
          def func(a):
              model=a[0]*self.x+a[1]
              inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*a[2]))
              return 0.5*(np.sum((self.y-model)**2*inv_sigma2 + np.log(inv_sigma2)))
          result = op.minimize(func, [self.m, self.b, np.log(self.f)],options={'gtol': 1e-6, 'disp': True})
          m_ml, b_ml, lnf_ml = result["x"]
          return result["x"]
      def lnprior(self,theta):
          m, b, lnf = theta
          if -5.0 < m < 0.5 and 0.0 < b < 10.0 and -10.0 < lnf < 1.0:
             return 0.0
          return -np.inf
      def lnprob(self,theta):
          lp = self.lnprior(theta)
          likelihood=self.lnlike(theta)
          if not np.isfinite(lp):
             return -np.inf
          return lp + likelihood
      def lnlike(self,theta):
          m, b, lnf = theta
          model = m * self.x + b
          inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*lnf))
          return -0.5*(np.sum((self.y-model)**2*inv_sigma2 - np.log(inv_sigma2)))
      def run_mcmc(self,nstep):
          ndim, nwalkers = 3, 100
          pos = [self.get_results() + 1e-4*np.random.randn(ndim) for i in range(nwalkers)]
          self.sampler = emcee.EnsembleSampler(nwalkers, ndim, self.lnprob,threads=10)
          self.sampler.run_mcmc(pos, nstep)
test=modelfit()
test.x=x
test.y=y
test.yerr=yerr
test.get_results()
test.run_mcmc(5000)

I got this error message :

File "MCMC_model.py", line 157, in run_mcmc
    self.sampler.run_mcmc(theta0, nstep)
  File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob
  File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map
  File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

I reckon it has something to do with how I have used multiprocessing in the class but I could not figure out how I could keep the structure of my class the way it is and meanwhile use multiprocessing as well??!!

I will appreciate for any tips.

P.S. I have to mention the code works perfectly if I remove threads=10 from the last function.

Gabriel
  • 40,504
  • 73
  • 230
  • 404
Dalek
  • 4,168
  • 11
  • 48
  • 100
  • Also a side note, you could replace the `scipy.optimize` call with `mystic`, which can do optimization in parallel. That's got nothing to do with your error… just a FYI. See: https://github.com/uqfoundation. – Mike McKerns Mar 31 '15 at 12:22

2 Answers2

2

There are a number of SO questions that discuss what's going on:

  1. https://stackoverflow.com/a/21345273/2379433

  2. https://stackoverflow.com/a/28887474/2379433

  3. https://stackoverflow.com/a/21345308/2379433

  4. https://stackoverflow.com/a/29129084/2379433

…including this one, which seems to be your response… to nearly the same question:

  1. https://stackoverflow.com/a/25388586/2379433

However, the difference here is that you are not using multiprocessing directly -- butemcee is. Therefore, the pathos.multiprocessing solution (from the links above) is not available for you. Since emcee uses cPickle, you'll have to stick to things that pickle knows how to serialize. You are out of luck for class instances. Typical workarounds are to either use copy_reg to register the type of object you want to serialize, or to add a __reduce__ method to tell python how to serialize it. You can see several of the answers from the above links suggest similar things… but none enable you to keep the class the way you have written it.

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • I have read some of the solutions in stackoverflow but as you mentioned I didn't find a way to apply it for my case. So what you are also suggesting is that writing a class in this application wouldn't work? This code is an example of my real code which I somehow need to keep the class structure because it is much easier to use class `instances` for the variables in different functions. – Dalek Mar 29 '15 at 12:53
  • It's not your code, it's `emcee` using `cPickle`, and thus can't pickle instance methods. You might try subclassing `EnsembleSampler` as that's essentially `Pool` -- just like `run_mcmc` is a `map`. You'll either need to show `cPickle` how to register instance methods (in general) in the pickle registry, or how to register your class instance methods. The alternative with `multiprocessing` is to use indirection to call the method as a function. You could complain to the `emcee` authors and submit a patch using `pathos` (which works) -- it's very likely a one line change. – Mike McKerns Mar 29 '15 at 13:57
  • Thanks for tips. I wrote my problem as an issue for emcee and got [this response](https://github.com/dfm/emcee/issues/148#issuecomment-88057695) – Dalek Mar 31 '15 at 12:53
  • but as it mentioned in the reply if I just attribute the sampler to a normal variable I would still get the previous error and then I added two given function suggested [here](http://stackoverflow.com/questions/22450160/using-multiprocessing-pool-on-bound-methods) to the beginning of my code and got this error: `AttributeError: ("class modelfit has no attribute 'mro'", , ('log_posterior', , )) Process PoolWorker-51: Traceback (most recent call last)` – Dalek Mar 31 '15 at 12:54
  • I've seen the suggestions in the link you reference. It's a common solution, but not a robust one. – Mike McKerns Mar 31 '15 at 14:22
1

For the record, you can now create a pathos.multiprocessing pool, and pass it to emcee using the pool argument. However, be aware that the overhead of multiprocessing can actually slow things down, unless your likelihood is particularly time-consuming to compute.

TheBamf
  • 597
  • 5
  • 8