I am trying to streamline a program that involves a set of short tasks that can be done in parallel, where the results of the set of tasks must be compared before moving onto the next step (which again involves a set of short tasks, and then another set, etc.). Due to the level of complexity of these tasks, it's not worthwhile to use multiprocessing
due to the set-up time. I am wondering if there is another way to do these short tasks in parallel that is faster than linear. The only question I can find on this site that describes this problem for Python references this answer on memory sharing which I don't think answers my question (or if it does I could not follow how).
To illustrate what I am hoping to do, consider the problem of summing a bunch of numbers from 0
to N
. (Of course this can be solved analytically, my point is to come up with a low-memory but short CPU-intensive task.) First, the linear approach would simply be:
def numbers(a,b):
return(i for i in range(a,b))
def linear_sum(a):
return(sum(numbers(a[0],a[1])))
n = 2000
linear_sum([0, n+1])
#2001000
For threading, I want to break the problem into parts that can then be summed separately and then combined, so the idea would be to get a bunch of ranges over which to sum with something like
def get_ranges(i, Nprocess = 3):
di = i // Nprocess
j = np.append(np.arange(0, i, di), [i+1,])
return([(j[k], j[k+1]) for k in range(len(j)-1)])
and for some value n >> NProcesses
the pseudocode example would be something like
values = get_ranges(n)
x = []
for value in values:
x.append(do_someting_parallel(value))
return(sum(x))
The question then, is how to implement do_someting_parallel
? For multiprocessing
, we can do something like:
from multiprocessing import Pool as ThreadPool
def mpc_thread_sum(i, Nprocess = 3):
values = get_ranges(i)
pool = ThreadPool(Nprocess)
results = pool.map(linear_sum, values)
pool.close()
pool.join()
return(sum(results))
print(mpc_thread_sum(2000))
# 2001000
The graph below shows the performance of the different approaches described. Is there a way to speed up computations for the region where multiprocessing
is still slower than linear or is this the limit of parallelization in Python's GIL? I suspect the answer may be that I am hitting my limit but wanted to ask here to be sure. I tried multiprocessing.dummy
, asyncio
, threading
, and ThreadPoolExecutor
(from concurrent.futures
). For brevity, I have omitted the code, but all show comparable execution time to the linear approach. All are designed for I/O tasks, so are confined by GIL.