2

I am using multiprocessing.Pool to distribute the work of a method on several processors. When I add something to a dictionary, it is lost after the method is executed. Why is that? And how to circumvent it?

from multiprocessing import Pool


class Agent:
    def __init__(self):
        self.test_dict = {}

    def apply(self, num):
        # something very processor intensive here
        self.test_dict[num] = num
        print 'inside ', self.test_dict

def F(x):
    agent, i = x
    return agent.apply(i)

class SeriesInstance(object):
    def __init__(self):
        self.agent = Agent()
        self.F = F

    def run(self):
        p = Pool()

        for i in range(5):
            out = p.map(F, [(self.agent, i),])

            print 'outside', self.agent.test_dict


        p.close()
        p.join()

        return out

if __name__ == '__main__':
    SeriesInstance().run()

the output is this, but outside should be equal to inside

inside  {0: 0}
outside {}
inside  {1: 1}
outside {}
inside  {2: 2}
outside {}
inside  {3: 3}
outside {}
inside  {4: 4}
outside {}
Davoud Taghawi-Nejad
  • 16,142
  • 12
  • 62
  • 82
  • possible duplicate of [Python multiprocessing: How do I share a dict among multiple processes?](http://stackoverflow.com/questions/6832554/python-multiprocessing-how-do-i-share-a-dict-among-multiple-processes) – Peter Wood Sep 09 '15 at 13:17
  • In your real code, which part of this is doing processor-intensive work? – KobeJohn Sep 09 '15 at 13:43
  • no the real code is to long to be posted here the processor intensive task would be in the apply function of agent. – Davoud Taghawi-Nejad Sep 09 '15 at 14:45

2 Answers2

1

Please check Sharing state between processes and read the section Server process. It appears that you have to create a manager and use that manager to create the dict instances used in your Agent class.

from multiprocessing import Process, Manager

def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()
        l = manager.list(range(10))

        p = Process(target=f, args=(d, l))
        p.start()
        p.join()

        print(d)
        print(l)
Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
0

Noctis-Skytower has one solution, but the more general answer is that you shouldn't be trying to share state between processes unless you have a good reason.

To understand why the behavior is what it is, check out this answer to a similar question. When you make changes to the object in another process, you are actually making changes to a copy of that object. I.e. your object gets recreated in the sub process rather than used directly.

In the docs here you can see that you want to basically be passing simple messages instead of heavy objects. That might mean you need to redesign your workflow.

Community
  • 1
  • 1
KobeJohn
  • 7,390
  • 6
  • 41
  • 62