2

I am trying to multithread a function that takes a lot of time when executed iteratively.

A simplifier version would be:

from multiprocessing import Process, Manager

def f(my_dict, key, list1, list2):
    count = [0] * len(list1)
    for i, val1 in enumerate(list1):
        count[i] = sum(belongs_to(val1, val2) for val2 in list2)
    my_dict[key] = (sum(count))

manager = Manager()
my_dict = manager.dict()
job = [Process(target=f, args=(my_dict, record, value, list2)) 
        for record, value in other_dict.items()]
_ = [p.start() for p in job]
_ = [p.join() for p in job]
my_dict = {key: value for key, value in my_dict.items()}

When I run this code, my memory is overrun. Is there any easy way to limit the number of threads at the same time ?

Also, I am sharing the dictionary receiving the answers between all the threads thanks to Manager. Is there any way to share the lists given to the function f, as list2 is always the same ?

Anoikis
  • 105
  • 1
  • 13
  • You can use `multiprocessing.Pool()` to spawn a fixed number of worker threads – Pavel Dec 11 '17 at 13:59
  • Ok, but a Pool only works with a map or an apply function, which take only one parameter, isn't it ? Is there a syntax to make it work with several arguments ? – Anoikis Dec 11 '17 at 14:06
  • You can use multiprocessing with queues. First, write a subclass for Process. And overwrite the `start()` to grab tasks from the queue. Then in the main function, you can generate as many processes as you want, and then put the tasks into the queue. It's better to have a fixed number of processes, than a big number of process spawning for each tasks. Links: [pymotw](https://pymotw.com/2/multiprocessing/communication.html), [SO](https://stackoverflow.com/questions/11515944/how-to-use-multiprocessing-queue-in-python) – thuyein Dec 11 '17 at 14:30
  • Ok, I will try this solution. Do you have some links toward pages with clear examples ? Thank you. – Anoikis Dec 11 '17 at 15:32

0 Answers0