2

I have the following code: (simplified)

def main_func():
    anotherDic = {}
    dic = {(1,2):44, (4,6):33, (1,1):4, (2,3):4}
    ks = dic.keys()
    for i in ks:
        func_A(anotherDic, i[0], i[1], dic[i], 5) 

The main dictionary (dic) is quite big, and the for loops goes for 500 million iterations. I want to use multiprocessing to parallelize the loop on a multi-core machine. I have read several SO questions and multiprocessing lib documentation, and this very helpful video and still cannot figure it out. I want the program to fork into several threads when it reaches this loop, run in parallel and then after all processes have finished it should continue the program on single process from the line after the loop section. func_A received the dictionary value and key from dic, calculates some simple operations, and updates the anotherDic data. This is an independent process, as long as all the same i[0] keys are handles by same process. So, I cannot use pool map function which automatically divides the data between cores. I am going to sort the keys by the first element of key tuple, and then divide them manually between the threads.

How can i pass/share the very big dictionary (dic) between the processes? Different process will read and write to different keys (i.e. keys that each process deals with are different from the rest of processes) If I cannot find answer to this, I will just use smaller temporary dic for each process, and in the end just join the dics.

Then question is, how I can force process to fork and go muliprocessor just for the loop section, and after the loop all the processes join before continuing with rest of the code on a single thread?

cybergeek654
  • 290
  • 1
  • 5
  • 15

1 Answers1

2

A general answer involves using a Manager object. Adapted from the docs:

from multiprocessing import Process, Manager

def f(d):
    d[1] += '1'
    d['2'] += 2

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    d[1] = '1'
    d['2'] = 2

    p1 = Process(target=f, args=(d,))
    p2 = Process(target=f, args=(d,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

    print d

Output:

$ python mul.py 
{1: '111', '2': 6}

Original answer: Python multiprocessing: How do I share a dict among multiple processes?

Community
  • 1
  • 1
Raskayu
  • 735
  • 7
  • 20
  • Thanks Raskayu. is manager method efficient for extremely large dictionaries? Does it create copies of dic or multiple process use the same dic in memory? In my case, the keys that each process is writing do not overlap. – cybergeek654 Aug 18 '16 at 09:08
  • 1
    @cybergeek654 "When you create a multiprocessing.Manager, a separate server process is spawned, which is responsible for hosting all the objects created by the Manager." As you can see it's created only once, so it will be one copy in memory. – Raskayu Aug 18 '16 at 09:50