I have a list of class instances, and I want to call the same instance method in parallel, use pathos to be able to pickle instance method, The true problem is when I want to change/add an attribute to the instances, it doesn't work, I think this is because the pickling to sub-process is a deep-copy of the inputs. Anyone has any idea how to solve this? I don't want to change the way of writing the instance method ( such as return a value and put it together later).
from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp
import random
import os
pool = mp.Pool(mp.cpu_count())
class Person(object):
def __init__(self, name):
self.name = name
def print_name(self, num):
self.num = num
print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)
people = [Person('a'),
Person('b'),
Person('c'),
Person('d'),
Person('e'),
Person('f'),
Person('g'),
Person('h')]
for i, per in enumerate(people):
pool.apply_async(Person.print_name, (per, i) )
pool.close()
pool.join()
print 'their number'
for per in people:
print per.num
This is the output, the num attribute is not found, I think it is because the change is made on those copies.
In [1]: run delme.py
worker 13981, person name a, random int 0
worker 13982, person name b, random int 1
worker 13983, person name c, random int 2
worker 13984, person name d, random int 3
worker 13985, person name e, random int 4
worker 13986, person name f, random int 5
worker 13987, person name g, random int 6
worker 13988, person name h, random int 7
their number
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/chimerahomes/wenhoujx/brain_project/network_analysis/delme.py in <module>()
39 print 'their number'
40 for per in people:
---> 41 print per.num
AttributeError: 'Person' object has no attribute 'num'
following suggest in the comments, I try to return self from the child-process, but it seems a pathos bug that the returned self is NOT its original type. See the following code:
import pickle
# from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp
import random
import os
pool = mp.Pool(mp.cpu_count())
class Person(object):
def __init__(self, name):
self.name = name
def print_name(self, num):
self.num = num
print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)
# return itself and put everything together
return self
people = [Person('a'),
Person('b'),
Person('c'),
Person('d'),
Person('e'),
Person('f'),
Person('g'),
Person('h')]
# Parallel(n_jobs=-1)(delayed(Person.print_name)(per) for per in people)
res = []
for i, per in enumerate(people):
res.append(pool.apply_async(Person.print_name, (per, i) ))
pool.close()
pool.join()
people = [rr.get() for rr in res]
print 'their number'
for per in people:
print per.num
print isinstance(people[0], Person)
and this is the output:
In [1]: run delme.py
worker 29963, person name a, received int 0
worker 29962, person name b, received int 1
worker 29964, person name c, received int 2
worker 29962, person name d, received int 3
worker 29966, person name e, received int 4
worker 29967, person name f, received int 5
worker 29966, person name g, received int 6
worker 29967, person name h, received int 7
their number
0
1
2
3
4
5
6
7
False
I use the default multiprocessing package, and it has no such problem.