2

This question is related to the other one I posted days ago; I've read this question about the issue related to multiprocessing pickling with instance methods. The problem is that I did not understand how to apply the solution provided to my case:

def _pickle_method(method):
    # Author: Steven Bethard
    # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
    func_name = method.im_func.__name__
    obj = method.im_self
    cls = method.im_class
    cls_name = ''
    if func_name.startswith('__') and not func_name.endswith('__'):
        cls_name = cls.__name__.lstrip('_')
    if cls_name:
        func_name = '_' + cls_name + func_name
    return _unpickle_method, (func_name, obj, cls)

def _unpickle_method(func_name, obj, cls):
    # Author: Steven Bethard
    # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
    for cls in cls.mro():
        try:
            func = cls.__dict__[func_name]
        except KeyError:
            pass
        else:
            break
    return func.__get__(obj, cls)

copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)

class Circle(Feature):
# Stuff...
    def __points_distance(self,points):
        xa = n.array([self.xc,self.yc]).reshape((1,2))
        d = n.abs(dist.cdist(points,xa) - self.radius)
        return d

def points_distance(self,points,pool=None):
    if pool:
        return pool.map(self.__points_distance,points)
    else:
        return self.__points_distance(points)

This gives ValueError: XA must be a 2-dimensional array error when running this:

import tra.features as fts
import numpy as np
import multiprocessing as mp

points = np.random.random(size=(1000,2))
circle_points = np.random.random(size=(3,2))

feature = fts.Circle(circle_points)

pool = mp.Pool()
ds = feature.points_distance(points,pool=pool)

but it (obviously) work when doing:

pool = None
ds = feature.points_distance(points,pool=pool)

Any clues?

This is different from this (I checked this implementation) because the method is used inside another class that instantiate the Circle class and calls its points_distance method. In any case another difference is that points_distance method uses scipy.spatial.distance.cdist that expects (n,2)-shaped numpy.ndarray. It works when using the serial version but raises the exception I mentioned when used in parallel. I suppose there's a caveat of arguments passing with cPickle.

Community
  • 1
  • 1
rdbisme
  • 850
  • 1
  • 13
  • 39
  • 1
    possible duplicate of [Can't pickle when using python's multiprocessing Pool.map()](http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma) – User Jul 11 '15 at 11:17
  • 1
    @User thanks for the help. I already checked the answer you provide but at the moment is not working for me. – rdbisme Jul 11 '15 at 14:53

3 Answers3

3

I think there is quite a bit of confusion here so I'm not sure I understand the problem.

The exception NameError: global name 'pool' is not defined is not due to a pickling issue but rather to a scoping problem.

The method cannot find pool in its scope. Try fixing it by passing the pool reference to the method.

Other thing:

pool = mp.Pool(mp.cpu_count())

The cpu_count() call is redundant as the Pool already spawns as many workers as CPUs you have by default.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • thanks for your answer. I solved the scope issue but I'm getting another Exception related to `scipy.spatial.distance.cdist`. If you want you can edit your answer if you know what's going on. – rdbisme Jul 15 '15 at 09:37
  • I'd rather prefer you to close this question as solved and open a new one where you explain the new issue you encounter. As other people might read this question would be easier for them to have a separate context. – noxdafox Jul 15 '15 at 10:27
1

The points array you pass to pool.map has a shape of (1000, 2). When pool.map splits it up to pass as the points argument to __points_distance, that array only has shape (2,).

Try adding points.shape = (1, 2) to the body of __points_distance before the call to cdist.

codewarrior
  • 2,000
  • 14
  • 14
  • 1
    I'm a bit new to multiprocessing, but it seems like the way it splits your (1000, 2) array into a thousand (2,) arrays isn't a good use of numpy's capabilities. Maybe there's a way to, say, split a (1000000, 2) array into a thousand (1000, 2) arrays and send each of those to a worker. – codewarrior Jul 15 '15 at 09:58
  • I guess the obvious way is to reshape `points` to (10, 100, 2) and then `vstack` the results together.... – codewarrior Jul 15 '15 at 10:04
  • this is a good point. I'm quite worried that it should be done manually using `multiprocessing.Queue` and `multiprocessing.Process`. I think on that I bit waiting for possibly other answers. Thanks for your help. – rdbisme Jul 15 '15 at 10:15
  • Probably pre-splitting the array before giving it to `pool.map` and then inserting the chunks in a list could be a solution. Let's say using `numpy.split`. – rdbisme Jul 15 '15 at 10:22
  • I think you're getting a bit far from your original question ;) – codewarrior Jul 15 '15 at 10:24
0

The pool variable is defined outside of Circle class, so points_distance() will be unable to findpool` in its namespace:

Add a constructor to Circle or Feature which accepts a pool, and pass the pool you want to use to RansacFeature, which I assume instantiates Circles for you.

knite
  • 6,033
  • 6
  • 38
  • 54