0

I am writing a program where I have object oriented code where I am trying to do multiprocessing. I was getting pickle errors because by default python can serialize functions but not class methods. So I used suggestion on Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map() but the problem is that if I have some lambda expressions inside my methods it's not working. My sample code is as follows:

import numpy as np

from copy_reg import pickle
from types import MethodType
from multiprocessing.pool import ApplyResult
from _functools import partial
from _collections import defaultdict


class test(object):
    def __init__(self,words):
        self.words=words
#         self.testLambda = defaultdict(lambda : 1.)

    def parallel_function(self,f):
        def easy_parallize(f,sequence):
            from multiprocessing import Pool
            pool = Pool(processes=50) # depends on available cores
            result = pool.map(f, sequence) # for i in sequence: result[i] = f(i)
            cleaned = [x for x in result if not x is None] # getting results
            cleaned = np.asarray(cleaned)
            pool.close() # not optimal! but easy
            pool.join()
            return cleaned
        from functools import partial


        return partial(easy_parallize, f)

    def dummy(self):
        self.t=defaultdict(lambda:1.)

    def test(self,a,b,x):
        print x
        print a
        return x*x

    def testit(self):
        sequence=[1,2,3,4,5]
        f1=partial(self.test,'a','b')
        f_p=self.parallel_function(f1)
        results=f_p(sequence)


def _pickle_method(method):
    func_name = method.im_func.__name__
    obj = method.im_self
    cls = method.im_class
    return _unpickle_method, (func_name, obj, cls)

def _unpickle_method(func_name, obj, cls):
    for cls in cls.mro():
        try:
            func = cls.__dict__[func_name]
        except KeyError:
            pass
        else:
            break
    return func.__get__(obj, cls)



if __name__ ==   "__main__":
    pickle(MethodType, _pickle_method, _unpickle_method)
    t=test('fdfs')
    t.dummy()
    t.testit()

But I get following error due to lambda expression:

Traceback (most recent call last):
  File "/home/ngoyal/work/nlp_source/language-change/test.py", line 76, in <module>
    t.testit()
  File "/home/ngoyal/work/nlp_source/language-change/test.py", line 51, in testit
    results=f_p(sequence)
  File "/home/ngoyal/work/nlp_source/language-change/test.py", line 28, in easy_parallize
    result = pool.map(f, sequence) # for i in sequence: result[i] = f(i)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

Is there any straight forward way to tackle it without moving to some other package which uses dill or something? Can this be done with normal python libraries? (I am using python 2.7)

Community
  • 1
  • 1
Naman
  • 2,569
  • 4
  • 27
  • 44

2 Answers2

0

The pickle module can't serialize lambda functions because they all have the same name (<lambda>). Just use a conventional function and it should work.

cdonts
  • 9,304
  • 4
  • 46
  • 72
0

If you look further down in the link you posted… to my answer (https://stackoverflow.com/a/21345273/2379433), you'll see you can indeed do what you want to do… even if you use lambdas and default dicts and all sorts of other python constructs. All you have to do is replace multiprocessing with pathos.multiprocessing… and it works. Note, I'm even working in the interpreter.

>>> import numpy as np
>>> from functools import partial
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> from collections import defaultdict
>>> 
>>> class test(object):
...   def __init__(self, words):
...     self.words = words
...   def parallel_function(self, f):
...     def easy_parallelize(f, sequence):
...       p = Pool()
...       result = p.map(f, sequence)
...       cleaned = [x for x in result if not x is None]
...       cleaned = np.asarray(cleaned)
...       return cleaned
...     return partial(easy_parallelize, f)
...   def dummy(self):
...     self.t = defaultdict(lambda: 1.)
...   def test(self, a, b, x):
...     print x
...     print a
...     print x*x
...   def testit(self):
...     sequence = [1,2,3,4,5]
...     f1 = partial(self.test, 'a','b')
...     f_p = self.parallel_function(f1)
...     results = f_p(sequence)
...     return results
... 
>>> t = test('fdfs')
>>> t.dummy()
>>> t.testit()
1
a
1
2
a
4
3
a
9
4
a
16
5
a
25
array([], dtype=float64)

"It works" because pathos uses dill, which is a serializer that can pickle almost anything in python. You can even dynamically replace the method, and it still works.

>>> def parallel_funtion(self, f):
...   def easy_parallelize(f, sequence):
...     p = Pool()
...     return p.map(f, sequence)
...   return partial(easy_parallelize, f)
... 
>>> test.parallel_function = parallel_funtion 
>>> 
>>> t.testit()
1
a
1
2
a
4
3
a
9
4
a
16
5
a
25
[None, None, None, None, None]

Get pathos and dill here: https://github.com/uqfoundation

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Although ideally I would have loved to do it with standard distribution but since if it is not possible with that and using pathos seems so straight forward, I will definitely give it a try. Thanks a lot for your great response. I do hope that someday dill replaces pickle in standard python distribution. – Naman Mar 06 '15 at 03:24
  • I am sorry to comment after so late, but I want to know is there any single package of pathos that I can install from single tar with all the dependencies like dill, pyina and pox? – Naman Mar 13 '15 at 03:27
  • @Naman: no, there's not a distribution that contains all the dependencies. However, you can install the versions on github using setuptools, or pip with a little more work. A new release is a bit overdue, but should be available soon. – Mike McKerns Mar 13 '15 at 10:04
  • Thanks for reply and help. It'd be great to have a new pip release. – Naman Mar 13 '15 at 17:46