1

Consider the following:

def update_dict(d, k):
    d[k] = True

This works as expected:

from functools import partial

my_dict = {}

_ = any(map(partial(update_dict, my_dict), range(5)))

print(my_dict)
# {0: True, 1: True, 2: True, 3: True, 4: True}

However, when using a multiprocessing.Pool, the output is different:

from functools import partial
from multiprocessing import Pool

my_dict = {}
my_pool = Pool(processes=5)

_ = any(my_pool.imap(partial(update_dict, my_dict), range(5)))

print(my_dict)
# {}

It is as if my_dict was never updated whatsoever, what is the reason for this?

Matias Cicero
  • 25,439
  • 13
  • 82
  • 154

1 Answers1

0

It is perfectly normal: processes do not share memory (unless you explicitly ask them to). The dict you are passing to update_dict is copied in each process of the pool, which in turn updates its own copy of the dict. The dict in the parent process is not affected by that.

The utility of a process pool is when they are working on global things, such as the file system, or network connections from remote clients.

For processing in-memory data, you should rather use a thread-pool (which, in Python, is often suboptimal because of the GIL), or put your shared data in a shared memory segment (using multiprocessing.Manager).

Pierre-Antoine
  • 1,915
  • 16
  • 25