How could the application of a function to the elements of a NumPy array through numpy.apply_along_axis()
be parallelized so as to take advantage of multiple cores? This seems to be a natural thing to do, in the common case where all the calls to the function being applied are independent.
In my particular case—if this matters—, the axis of application is axis 0: np.apply_along_axis(func, axis=0, arr=param_grid)
(np
being NumPy).
I had a quick look at Numba, but I can't seem to get this parallelization, with a loop like:
@numba.jit(parallel=True)
result = np.empty(shape=params.shape[1:])
for index in np.ndindex(*result.shape)): # All the indices of params[0,...]
result[index] = func(params[(slice(None),) + index]) # Applying func along axis 0
There is also apparently a compilation option in NumPy for parallelization through OpenMP, but it does not seem to be accessible through MacPorts.
One can also think of maybe cutting the array in a few pieces and using threads (so as to avoid copying the data) and applying the function on each piece in parallel. This is more complex than what I am looking for (and might not work if the Global Interpreter Lock is not released enough).
It would be very nice to be able to use multiple cores in a simple way for simple parallelizable tasks like applying a function to all the elements of an array (which is essentially what is needed here, with the small complication that function func()
takes a 1D array of parameters).