2

How do you run a function repeatedly in parallel?

For example, I have a function that takes no parameters and has a stochastic element. I want to run it multiple times, which is illustrated below using a for loop. How do I accomplish the same in parallel please?

import numpy as np

def f():
    x = np.random.uniform()
    return x*x    

np.random.seed(1)    
a = []
for i in range(10):
    a.append(f())

This is a duplicate of parallel-python-just-run-function-n-times, however, the answer doesn't quite fit as it passes different inputs into the function, and How do I parallelize a simple Python loop? also gives examples of passing different parameters into the function rather than repeating the same call.

I am on Windows 10 and using Jupyter


In regrds to my real use:

Does it produce a large volume of output per call?
Each iteration of the loop produces one number.

Do you need to keep the output? How long does each invocation take roughly?
Yes, I need to retain the numbers and it takes ~30 minutes per iteration.

?How many times do you need to run it in total?
At least 100.

Do you want to parallelize across multiple machines or just multiple cores?
Currently just across multiple cores.

user2957945
  • 2,353
  • 2
  • 21
  • 40
  • 1
    Does it produce a large volume of output per call? Do you need to keep the output? How long does each invocation take roughly? How many times do you need to run it in total? Do you want to parallelize across multiple machines or just multiple cores? – Mark Setchell Aug 20 '19 at 21:07
  • Hi @MarkSetchell ; I have edited in further details. Thanks – user2957945 Aug 20 '19 at 21:12
  • does it involve IO operations? is it heavy on computation? – Marat Aug 20 '19 at 21:13
  • Hi @Marat ; no there is no IO overhead, and no large memory requirement. It is an optimisation / scheduling task. – user2957945 Aug 20 '19 at 21:15
  • still, is it heavy on computation? It is important to know to differentiate between thread pool and process pool, and to understand how many of them you can execute in parallel. – Marat Aug 20 '19 at 21:17
  • @Marat; I'm not sure how to quantify the computation, sorry. I am running various simulations where each iteration estimates multiple optimisation calls. Each iteration can take ~30 minutes. So what i hope to do is evaluate this function with multiple optimisations in parallel rather than sequentially. – user2957945 Aug 20 '19 at 21:21
  • Assuming each run takes 100% of a CPU core, ThreadPool will hit GIL limitation. Also, it won't scale beyond number of physical cores. @noufel13's answer (the first part) looks about right for your case – Marat Aug 20 '19 at 21:31
  • Thanks @Marat, you just preempted the question I was going to ask. from your comment and https://stackoverflow.com/questions/46045956/whats-the-difference-between-threadpool-vs-pool-in-python-multiprocessing-modul seems likely multiprocessing.pool is the way to go as the calculations will be more cpu heavy. – user2957945 Aug 20 '19 at 21:35

1 Answers1

3

If you don't want to pass any input to your function, just use a Throwaway variable _ as argument to your function and parallelise it as shown in the below code.

import numpy as np
from multiprocessing.pool import Pool

def f(_):
    x = np.random.uniform()
    return x*x

if __name__ == "__main__":
    processes = 5   # Specify number of processes here 
    p = Pool(processes)
    p.map(f, range(10))

Update: To answer your updated question, if your tasks aren't too heavyweight and are just I/O bound, then I recommend you use ThreadPool (multithreading) instead of Pool (multiprocessing)

Code to create a Threadpool:

from multiprocessing.pool import ThreadPool

threads = 5
t = ThreadPool(threads)
t.map(f, range(10))
noufel13
  • 653
  • 4
  • 4