I'm new to parallel processing but have an application for which it will be useful.
I have ~10-100k object instances (of type ClassA
), and I want to use the multiprocessing module to distribute the work of calling a particular class method on each of the objects. I've read most of the multiprocessing documentation and several posts about calling class methods, but I have an additional complication that the ClassA objects all have an attribute pointing to the same instance of another type (ClassB
), which they may add/remove themselves or other objects to/from. I know sharing state is bad for concurrent processes, so I'm wondering if this is even possible. To be honest, the Proxy/Manager mutliprocessing methods are a little too much over my head to understand all of their implications for shared objects, but if someone else assured me that I could get it to work I'd spend more time understanding them. If not, this will be a lesson in designing for distributed processes.
Here is a simplified version of my problem:
ClassA:
def __init__(self, classB_state1, classB_state2, another_obj):
# Pointers to shared ClassB instances
self.state1 = classB_state1
self.state2 = classB_state2
self.state1.add(self)
self.object = another_obj
def run(classB_anothercommonpool):
# do something to self.object
if #some property of self.object:
classB_anothercommonpool.add(object)
self.object = None
self.switch_states()
def switch_states(self):
if self in self.state1:
self.state1.remove(self)
self.state2.add(self)
elif self in self.state2:
self.state2.remove(self)
self.state1.add(self)
else:
print "State switch failed!"
ClassB(set):
# This is essentially a glorified set with a hash so I can have sets of sets.
# If that's a bad design choice, I'd also be interested in knowing why
def __init__(self, name):
self.name = name
super(ClassB, self).__init__()
def __hash__(self):
return id(self)
ClassC:
def __init__(self, property):
self.property = property
# Define an import-able function for the ClassA method, for multiprocessing
def unwrap_ClassA_run(classA_instance):
return classA_instance.run(classB_anothercommonpool)
def initialize_states():
global state1
global state2
global anothercommonpool
state1 = ClassB("state1")
state2 = ClassB("state2")
anothercommonpool = ClassB("objpool")
Now, within the same .py file that the classes are defined:
from multiprocessing import Pool
def test_multiprocessing():
initialize_states()
# There are actually 10-100k classA instances
object1 = ClassC('iamred')
object2 = ClassC('iamblue')
classA1 = ClassA(state1, state2, object1)
classA2 = ClassA(state1, state2, object2)
pool = Pool(processes = 2)
pool.map(unwrap_ClassA_run, [classA1, classA2])
If I import this module in an interpreter and run test_multiprocessing(), I get no errors at runtime, but the "Switch state failed!" message is displayed and if you examine the classA1/2 objects, they have not modified their respective objects1/2, nor switched membership of either of the ClassB state objects (so the ClassA objects do not register that they are a member of the state1 set). Thanks!