Yes, that is doable. Your calculation is not dependend on intermediate results, so you can easily divide the task into chunks and distribute it over multiple processes. It's what is called an
embarrassingly parallel problem.
The only tricky part here might be, to divide the range into fairly equal parts in the first place. Straight out my personal lib two functions to deal with this:
# mp_utils.py
from itertools import accumulate
def calc_batch_sizes(n_tasks: int, n_workers: int) -> list:
"""Divide `n_tasks` optimally between n_workers to get batch_sizes.
Guarantees batch sizes won't differ for more than 1.
Example:
# >>>calc_batch_sizes(23, 4)
# Out: [6, 6, 6, 5]
In case you're going to use numpy anyway, use np.array_split:
[len(a) for a in np.array_split(np.arange(23), 4)]
# Out: [6, 6, 6, 5]
"""
x = int(n_tasks / n_workers)
y = n_tasks % n_workers
batch_sizes = [x + (y > 0)] * y + [x] * (n_workers - y)
return batch_sizes
def build_batch_ranges(batch_sizes: list) -> list:
"""Build batch_ranges from list of batch_sizes.
Example:
# batch_sizes [6, 6, 6, 5]
# >>>build_batch_ranges(batch_sizes)
# Out: [range(0, 6), range(6, 12), range(12, 18), range(18, 23)]
"""
upper_bounds = [*accumulate(batch_sizes)]
lower_bounds = [0] + upper_bounds[:-1]
batch_ranges = [range(l, u) for l, u in zip(lower_bounds, upper_bounds)]
return batch_ranges
Then your main script would look like this:
import time
from multiprocessing import Pool
from mp_utils import calc_batch_sizes, build_batch_ranges
def target_foo(batch_range):
return sum(batch_range) # ~ 6x faster than target_foo1
def target_foo1(batch_range):
numbers = []
for num in batch_range:
numbers.append(num)
return sum(numbers)
if __name__ == '__main__':
N = 100000000
N_CORES = 4
batch_sizes = calc_batch_sizes(N, n_workers=N_CORES)
batch_ranges = build_batch_ranges(batch_sizes)
start = time.perf_counter()
with Pool(N_CORES) as pool:
result = pool.map(target_foo, batch_ranges)
r_sum = sum(result)
print(r_sum)
print(f'elapsed: {time.perf_counter() - start:.2f} s')
Note that I also switched your for-loop for a simple sum over the range object, since it offers much better performance. If you cant do this in your real app, a list comprehension would still be ~60% faster than filling your list manually like in your example.
Example Output:
4999999950000000
elapsed: 0.51 s
Process finished with exit code 0