0

I'm trying to speed up my python code by making use of multiprocessing using Python 2.7.6.

The following 'minimal' example actually works - (as long as you have lmfit and parmap). Its structure is very similar to my actual code. My problem is that in my 'real code' parmap.starmap, which is basically a map() for multiple arguments (see this and this post) fails with

cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Is there a way to find out which instancemethod it can't pickle ? Like in the example I made sure that the function I pass to parmap.starmap is defined at the top level of the module. It is actually the exact same function.

When debugging my 'real' code, the function passed to starmap has a lot of entries under 'func_globals'. I don't know if this has anything to do with it, but I also don't know why some of the entries are there. So I guess my real question is: How can one make the following code FAIL by importing other modules or classes and what should I look for in my real code to see if that is my problem?

import numpy as np
from lmfit.models import GaussianModel
import parmap

def do_fitting(a_instance, y, x):
    return a_instance.fit(x, y)

def gaussian(x, amp, cen, wid):
    return amp * np.exp(-(x-cen)**2 /wid)

class A(object):
    def __init__(self, model):
        super(A, self).__init__()
        self.model = model

    def fit(self, x, y):
        return self.model.fit(y, params=self.model.guess(y,x=x), x=x, verbose=False).result.params.valuesdict()

class B(object):
    def __init__(self):
        super(B,self).__init__()
        self.gaussians = []
        self.gmodels = {}
        self.x = np.linspace(-10,10)
        for n in range(400):
            self.gmodels[n] = A(GaussianModel())
            self.gaussians.append(gaussian(self.x, 2.33, 0.21, 1.51) + np.random.normal(0, 0.2, len(self.x)))

    def do_fits_parallel(self):
        results = parmap.starmap(do_fitting, zip(self.gmodels.itervalues(), self.gaussians), self.x)
        return results

if __name__ == '__main__':
    b = B()
    print b.do_fits_parallel()
Julian S.
  • 440
  • 4
  • 14
  • Have you seen http://stackoverflow.com/a/21345273/2379433? You could try `pathos` if you want a `map` that is capable of taking multiple arguments directly… or `multiprocess` if you just want a fork of `multiprocessing` that provides better serialization. Both `pathos` and `multiprocess` can generally serialize any class instances and instance methods in a `multiprocessing` context. – Mike McKerns Feb 11 '16 at 03:22
  • And directly answering your question, there are tools within the `dill` serializer that can aid in detecting the cause of serialization failure. – Mike McKerns Feb 11 '16 at 03:24
  • According to https://stackoverflow.com/questions/26059764/python-multiprocessing-with-pathos pathos works with a copy of what I pass to map(). Is that still correct? So I can't change anything in class A ? – Julian S. Feb 11 '16 at 05:31
  • `pathos.multiprocessing` or any version/fork of `multiprocessing` will make a copy of the instance you pass to the new process. So you *can* make changes to the instance on the other process… but the only thing that will get returned to the original process is what you return from the map. All changes to the instance that happens on the other processes will not affect the original… unless you set up an object that uses shared memory. – Mike McKerns Feb 11 '16 at 14:28

0 Answers0