35

Here is a simple multiprocessing code:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

Output I get is:

{1: []}

Why don't I get {1: [4]} as the output?

martineau
  • 119,623
  • 25
  • 170
  • 301
Bruce
  • 33,927
  • 76
  • 174
  • 262

4 Answers4

38

Here is what you wrote:

# from here code executes in main process and all child processes
# every process makes all these imports
from multiprocessing import Process, Manager

# every process creates own 'manager' and 'd'
manager = Manager() 
# BTW, Manager is also child process, and 
# in its initialization it creates new Manager, and new Manager
# creates new and new and new
# Did you checked how many python processes were in your system? - a lot!
d = manager.dict()

def f():
    # 'd' - is that 'd', that is defined in globals in this, current process 
    d[1].append(4)
    print d

if __name__ == '__main__':
# from here code executes ONLY in main process 
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

Here is what you should have written:

from multiprocessing import Process, Manager
def f(d):
    d[1] = d[1] + [4]
    print d

if __name__ == '__main__':
    manager = Manager() # create only 1 mgr
    d = manager.dict() # create only 1 dict
    d[1] = []
    p = Process(target=f,args=(d,)) # say to 'f', in which 'd' it should append
    p.start()
    p.join()
Brad P.
  • 298
  • 3
  • 11
akaRem
  • 7,326
  • 4
  • 29
  • 43
  • That doesn't actually work, you get: `{1: []}` instead of `{1: [4]}` – crysis405 Apr 30 '16 at 15:43
  • @crysis405 I fixed that. Looks like manager's dict is not fully dumped while transfering between processes, so we need to replace original value with other list with another id. – akaRem May 02 '16 at 09:36
  • 1
    @akaRem, you saved my day, this should be stated very clearly somewhere that Manager() should be single global object for the whole application – Maxim Galushka Jan 27 '17 at 11:31
  • @akaRem, Can I assigne some existing dictionary here? For example, in place of `d[1]=[]`, how can I do `d = d1`, where d1 is some existing dictionary. I think it will replace the manager class at all. – Ruchit Patel May 10 '20 at 13:47
  • @MikePatel Nope, it doesn't work. I tried something similar, and it failed. The managerized dict is a proxy/channel among processes. That's why it has to be a mutable container. You can put in data in this proxy container, but things would go awry if you want to assign itself in the new/child process with another dict. – Qiang Xu Sep 19 '20 at 13:51
26

The reason that the new item appended to d[1] is not printed is stated in Python's official documentation:

Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified. To modify such an item, you can re-assign the modified object to the container proxy.

Therefore, this is actually what happens:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # invoke d.__getitem__(), returning a local copy of the empty list assigned by the main process,
    # (consider that a KeyError exception wasn't raised, so a list was definitely returned),
    # and append 4 to it, however this change is not propagated through the manager,
    # as it's performed on an ordinary list with which the manager has no interaction
    d[1].append(4)
    # convert d to string via d.__str__() (see https://docs.python.org/2/reference/datamodel.html#object.__str__),
    # returning the "remote" string representation of the object (see https://docs.python.org/2/library/multiprocessing.html#multiprocessing.managers.SyncManager.list),
    # to which the change above was not propagated
    print d

if __name__ == '__main__':
    # invoke d.__setitem__(), propagating this assignment (mapping 1 to an empty list) through the manager
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

Reassigning d[1] with a new list, or even with the same list once again, after it was updated, triggers the manager to propagate the change:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # perform the exact same steps, as explained in the comments to the previous code snippet above,
    # but in addition, invoke d.__setitem__() with the changed item in order to propagate the change
    l = d[1]
    l.append(4)
    d[1] = l
    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

The line d[1] += [4] would have worked as well.


EDIT for Python 3.6 or later:

Since Python 3.6, per this changeset following this issue, it's also possible to use nested Proxy Objects which automatically propagate any changes performed on them to the containing Proxy Object. Thus, replacing the line d[1] = [] with d[1] = manager.list() would correct the issue as well:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    # the __str__() method of a dict object invokes __repr__() on each of its items,
    # so explicitly invoking __str__() is required in order to print the actual list items
    print({k: str(v) for k, v in d.items()})

if __name__ == '__main__':
    d[1] = manager.list()
    p = Process(target=f)
    p.start()
    p.join()

Unfortunately, this bug fix was not ported to Python 2.7 (as of Python 2.7.13).


NOTE (running under the Windows operating system):

Although the described behaviour applies to the Windows operating system as well, the attached code snippets would fail when executed under Windows due to the different process creation mechanism, that relies on the CreateProcess() API rather than the fork() system call, which isn't supported.

Whenever a new process is created via the multiprocessing module, Windows creates a fresh Python interpreter process that imports the main module, with potentially hazardous side effects. In order to circumvent this issue, the following programming guideline is recommended:

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

Therefore, executing the attached code snippets as is under Windows would try to create an infinite number of processes due to the manager = Manager() line. This can be easily fixed by creating the Manager and Manager.dict objects inside the if __name__ == '__main__' clause and passing the Manager.dict object as an argument to f(), as done in this answer.

More details on the issue may be found in this answer.

Yoel
  • 9,144
  • 7
  • 42
  • 57
13

I think this is a bug in manager proxy calls. You can circumvent avoiding call methods of shared list, like:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # get the shared list
    shared_list = d[1]

    shared_list.append(4)

    # forces the shared list to 
    # be serialized back to manager
    d[1] = shared_list

    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

    print d
Carlo Pires
  • 4,606
  • 7
  • 32
  • 32
2
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
l=manager.list()

def f():
    l.append(4)
    d[1]=l
    print d

if __name__ == '__main__':
    d[1]=[]
    p = Process(target=f)
    p.start()
    p.join()
zz yzz
  • 21
  • 1
  • 9
    Welcome to Stack Overflow! While this might solve the askers problem it doesn't explain why. It would be good to add some explanation. – James Fenwick Mar 10 '16 at 09:55