0

I have a list of class instances, and I want to call the same instance method in parallel, use pathos to be able to pickle instance method, The true problem is when I want to change/add an attribute to the instances, it doesn't work, I think this is because the pickling to sub-process is a deep-copy of the inputs. Anyone has any idea how to solve this? I don't want to change the way of writing the instance method ( such as return a value and put it together later).

from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp 
import random
import os

pool = mp.Pool(mp.cpu_count())

class Person(object):
    def __init__(self, name):
        self.name = name

    def print_name(self, num):
        self.num = num
        print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)


people = [Person('a'),
          Person('b'),
          Person('c'),
          Person('d'),
          Person('e'),
          Person('f'),
          Person('g'),
          Person('h')]


for i, per in enumerate(people):
    pool.apply_async(Person.print_name, (per, i) )

pool.close()
pool.join()
print 'their number'
for per in people:
    print per.num

This is the output, the num attribute is not found, I think it is because the change is made on those copies.

In [1]: run delme.py
worker 13981, person name a, random int 0
worker 13982, person name b, random int 1
worker 13983, person name c, random int 2
worker 13984, person name d, random int 3
worker 13985, person name e, random int 4
worker 13986, person name f, random int 5
worker 13987, person name g, random int 6
worker 13988, person name h, random int 7
their number
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/chimerahomes/wenhoujx/brain_project/network_analysis/delme.py in <module>()
     39 print 'their number'
     40 for per in people:
---> 41     print per.num

AttributeError: 'Person' object has no attribute 'num'

following suggest in the comments, I try to return self from the child-process, but it seems a pathos bug that the returned self is NOT its original type. See the following code:

import pickle
# from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp 
import random
import os

pool = mp.Pool(mp.cpu_count())

class Person(object):
    def __init__(self, name):
        self.name = name

    def print_name(self, num):
        self.num = num
        print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)
        # return itself and put everything together
        return self



people = [Person('a'),
          Person('b'),
          Person('c'),
          Person('d'),
          Person('e'),
          Person('f'),
          Person('g'),
          Person('h')]

# Parallel(n_jobs=-1)(delayed(Person.print_name)(per) for per in people)

res = []
for i, per in enumerate(people):
    res.append(pool.apply_async(Person.print_name, (per, i) ))

pool.close()
pool.join()
people = [rr.get() for rr in res]


print 'their number'
for per in people:
    print per.num

print isinstance(people[0], Person)

and this is the output:

In [1]: run delme.py
worker 29963, person name a, received int 0
worker 29962, person name b, received int 1
worker 29964, person name c, received int 2
worker 29962, person name d, received int 3
worker 29966, person name e, received int 4
worker 29967, person name f, received int 5
worker 29966, person name g, received int 6
worker 29967, person name h, received int 7
their number
0
1
2
3
4
5
6
7
False

I use the default multiprocessing package, and it has no such problem.

fast tooth
  • 2,317
  • 4
  • 25
  • 34
  • Very similar question here: http://stackoverflow.com/questions/26059764/python-multiprocessing-with-pathos. And the answer is essentially an extended version of what @tdelaney gives below. – Mike McKerns Nov 08 '14 at 18:59
  • Your edit doesn't demonstrate a bug, it's a feature. If you were working in `__main__`, you look at `person[0].__class__`, it will be ``, which you might think is `Person`… but it's not. When you pass along a class instance, by default `dill` pickles the instance and the class definition -- thus, you are generate a new instance from the picked class on return from multiprocessing. `pickle` serializes by reference, `dill` does not (by default). With `dill`, you *can* serialize by reference, and thus reference the original class. `pathos.multiprocessing` uses `dill`. – Mike McKerns Nov 09 '14 at 17:22
  • With `dill`, by serializing the class definition along with the instance… before you unpickle, you can change the class definition, delete the class definition, or otherwise munge the class definition and the instance will still unpickle correctly. You can even unpickle the class instance to a completely new python session, where you didn't import or define the class. The point being, this is what is the default, and if you use the `byref` flag, you can revert to the `pickle` behavior. Maybe it's a bit unexpected, but it's a feature. – Mike McKerns Nov 09 '14 at 17:27
  • If you think this shouldn't be a feature, or if you think the feature should be modified in some way, please submit a ticket at https://github.com/uqfoundation/pathos – Mike McKerns Nov 09 '14 at 17:33

1 Answers1

0

The problem is that self.num is a assigned in the child process. multiprocessing does not pass the original object back to the caller. It does pass the method's return code back. So, you could pass num back directly or even self (but that is generally inefficient and doesn't replace the existing object in the parent, just creates a new one).

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • I'm the `dill` and `pathos` author. There are plans to enable this type of behavior, and it should be straightforward to do… however what you describe is right on -- and your suggestion to pass back the return value is the way to go at the moment. – Mike McKerns Nov 08 '14 at 19:03
  • @MikeMcKerns - thanks for the info. I was worried that I may have missed something about pathos, so I'm glad to hear that I got it right. Updating the parent would be an interesting feature... and potentially disasterous if you didn't know it was happening! – tdelaney Nov 08 '14 at 20:26
  • I totally agree. It's to be a keyword option, that's False by default, and probably limited to special cases. Not sure how it'll play out yet, however. – Mike McKerns Nov 08 '14 at 23:05
  • @MikeMcKerns, Can read the edit of the question, it seems there is a bug in the pathos package. – fast tooth Nov 09 '14 at 14:21