1

I can't figure out if this is because of me, or the multiprocessing module that Python2.7 has. Can anyone figure out why this is not working?

from multiprocessing import pool as mp
class encapsulation:
   def __init__(self):
       self.member_dict = {}
   def update_dict(self,index,value):
       self.member_dict[index] = value
encaps = encapsulation()
def method(argument):
   encaps.update_dict(argument,argument)
   print encaps.member_dict
p = mp() #sets up multiprocess pool of processors
p.map(method,sys.argv[1:]) #method is the function, sys.argv is the list of arguments to multiprocess
print encaps.member_dict
>>>{argument:argument}
>>>{}

So my question is just about member variables. It is my understanding that the class encapsulation should hold this dictionary inside and outside of the function. Why does it reset and give me an empty dictionary even though I have only initialized it once? Please help

jdi
  • 90,542
  • 19
  • 167
  • 203
jwillis0720
  • 4,329
  • 8
  • 41
  • 74

1 Answers1

2

Even though you are encapsulating the object, the multiprocessing module will end up using a local copy of the object in each process and never actually propagate your changes back to you. In this case, you are not using the Pool.map properly, as it expects each method call to return a result, which is then sent back up to your return value. If what you want is to affect the shared object, then you need a manager, which will coordinate the shared memory:

Encapsulating a shared object

from multiprocessing import Pool 
from multiprocessing import Manager
import sys

class encapsulation:
   def __init__(self):
       self.member_dict = {}
   def update_dict(self,index,value):
       self.member_dict[index] = value

encaps = encapsulation()

def method(argument):
   encaps.update_dict(argument,argument)
   # print encaps.member_dict       

manager = Manager()
encaps.member_dict = manager.dict()

p = Pool()
p.map(method,sys.argv[1:])

print encaps.member_dict

output

$ python mp.py a b c
{'a': 'a', 'c': 'c', 'b': 'b'}

I would suggest not really setting the shared object as the member attribute, but rather passing in as an arg, or encapsulating the shared object itself, and then passing its values into your dict. The shared object cannot be kept persistently. It needs to be emptied and discarded:

# copy the values to a reg dict
encaps.member_dict = encaps.member_dict.copy()

But this might even be better:

class encapsulation:
   def __init__(self):
       self.member_dict = {}
   # normal dict update
   def update_dict(self,d):
       self.member_dict.update(d)

encaps = encapsulation()

manager = Manager()
results_dict = manager.dict()

# pass in the shared object only
def method(argument):
   results_dict[argument] = argument    

p = Pool()
p.map(method,sys.argv[1:])

encaps.update_dict(results_dict)

Using the pool.map as intended

If you were using the map to return values, it might look like this:

def method(argument):
   encaps.update_dict(argument,argument)
   return encaps.member_dict

p = Pool()
results = p.map(method,sys.argv[1:]) 
print results
# [{'a': 'a'}, {'b': 'b'}, {'c': 'c'}]

You would need to combine the results into your dict again:

for result in results:
    encaps.member_dict.update(result)
print encaps.member_dict
# {'a': 'a', 'c': 'c', 'b': 'b'}
jdi
  • 90,542
  • 19
  • 167
  • 203
  • it worked. The problem now is when I try and iterate through the new dictionary I get this. – jwillis0720 Jul 04 '12 at 02:15
  • See my update on copying out the data from your shared object instead of keeping it around. – jdi Jul 04 '12 at 02:48
  • Thanks so much, what a great, well though out answer! – jwillis0720 Jul 08 '12 at 00:48
  • Why do you need to copy out the data from the shared object? – Sveltely Jul 08 '15 at 16:45
  • @Sveltely, its been a while since I wrote this, but if memory serves, I believe it was because the `dict` returned from the `Manager` is a `Proxy`. And its lifetime is tied to the `Manager` which is associated with a spawned child process. So once you are done with generating the shared data, you would ideally not want to continue using the proxy, and collect the results into a standard container. – jdi Jul 08 '15 at 21:29