1

I'm trying to modify a dictionary (file) with a multiprocessing pool. However, I can't make it happen.

Here is what I'm trying:

import json
import multiprocessing



def teste1(_dict, _iterable):
    file1[f'{_iterable}'] = {'relevant': True}


file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}

pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
pool.apply_async(teste1, (file1, file1))
print(file1)

However, it's still printing out the same as before: {'item1': {'relevant': False}, 'item2': {'relevant': False}}

Could one noble soul help me out with this?

sophros
  • 14,672
  • 11
  • 46
  • 75
ankh
  • 147
  • 2
  • 9
  • It is a bad practice to call a variable `file` as it overlaps with the default name in the std library. – sophros Jan 21 '21 at 15:34
  • Well, that modifies that dictionary. However, how could I iterate over it on the given function? – ankh Jan 21 '21 at 15:56

1 Answers1

0

There are multiple issues with your approach:

  1. You are attempting to share a dictionary (file1) across a number of processes but you are actually sharing a copy of it. In order to solve this please refer to: multiprocessing: How do I share a dict among multiple processes?

  2. You are iterating over the copies of the dictionaries. Trying to index with a dictionary itself!

Assuming that what you need is a dictionary with values updated by parallel processes, you have two choices:

A. Share the dictionary across processes and iterate over keys like:

pool.apply_async(teste1, file1.keys())  # assuming file1 is properly shared

B. Simpler approach where you construct the resulting dictionary based on the return values from parallel run teste1 function:

def teste1(dict_key):
    # some logic dependent on dict_key
    return {'relevant': True}


file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}

pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
results = pool.map(teste1, file1.keys())
pool.close()
pool.join()

file2 = {k:v for k,v in zip(file1.keys(), results)}  # file1.keys() preserves the order so results and file1.keys() are corresponding
print(file2)
sophros
  • 14,672
  • 11
  • 46
  • 75