This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
To parallelize your example, you'd need to define your functions with the @ray.remote
decorator, and then invoke them with .remote
.
import numpy as np
import time
import ray
ray.init()
# Define the function. Each remote function will be executed
# in a separate process.
@ray.remote
def HeavyComputationThatIsThreadSafe(i, j):
n = i*j
time.sleep(0.5) # Simulate some heavy computation.
return n
N = 10
output_ids = []
for i in range(N):
for j in range(N):
# Remote functions return a future, i.e, an identifier to the
# result, rather than the result itself. This allows invoking
# the next remote function before the previous finished, which
# leads to the remote functions being executed in parallel.
output_ids.append(HeavyComputationThatIsThreadSafe.remote(i,j))
# Get results when ready.
output_list = ray.get(output_ids)
# Move results into an NxN numpy array.
outputs = np.array(output_list).reshape(N, N)
# This program should take approximately N*N*0.5s/p, where
# p is the number of cores on your machine, N*N
# is the number of times we invoke the remote function,
# and 0.5s is the time it takes to execute one instance
# of the remote function. For example, for two cores this
# program will take approximately 25sec.
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
Note: One point to keep in mind is that each remote function is executed in a separate process, possibly on a different machine, and thus the remote function's computation should take more than invoking a remote function. As a rule of thumb a remote function's computation should take at least a few 10s of msec to amortize the scheduling and startup overhead of a remote function.