Modifying outter dict inside a multiprocessing pool

Question

I'm trying to modify a dictionary (file) with a multiprocessing pool. However, I can't make it happen.

Here is what I'm trying:

import json
import multiprocessing



def teste1(_dict, _iterable):
    file1[f'{_iterable}'] = {'relevant': True}


file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}

pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
pool.apply_async(teste1, (file1, file1))
print(file1)

However, it's still printing out the same as before: {'item1': {'relevant': False}, 'item2': {'relevant': False}}

Could one noble soul help me out with this?

It is a bad practice to call a variable `file` as it overlaps with the default name in the std library. — sophros, Jan 21 '21 at 15:34
Well, that modifies that dictionary. However, how could I iterate over it on the given function? — ankh, Jan 21 '21 at 15:56

sophros · Accepted Answer · 2021-01-21T17:11:04.733

There are multiple issues with your approach:

You are attempting to share a dictionary (file1) across a number of processes but you are actually sharing a copy of it. In order to solve this please refer to: multiprocessing: How do I share a dict among multiple processes?
You are iterating over the copies of the dictionaries. Trying to index with a dictionary itself!

Assuming that what you need is a dictionary with values updated by parallel processes, you have two choices:

A. Share the dictionary across processes and iterate over keys like:

pool.apply_async(teste1, file1.keys())  # assuming file1 is properly shared

B. Simpler approach where you construct the resulting dictionary based on the return values from parallel run teste1 function:

def teste1(dict_key):
    # some logic dependent on dict_key
    return {'relevant': True}


file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}

pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
results = pool.map(teste1, file1.keys())
pool.close()
pool.join()

file2 = {k:v for k,v in zip(file1.keys(), results)}  # file1.keys() preserves the order so results and file1.keys() are corresponding
print(file2)

I'm getting `TypeError: 'ApplyResult' object is not iterable` with the second solution. — ankh, Jan 21 '21 at 17:05
Indeed. There should be `map` instead of `apply_async` here. — sophros, Jan 21 '21 at 17:11

Modifying outter dict inside a multiprocessing pool

1 Answers1