Is there a way to enable automatic parallelization for basic numpy operations, like element-wise multiplication of arrays and basic numpy functions like np.sum and np.average?
I know that it is possible for blas/lapack functions, as discussed for scipy.linalg.solve in this thread:
Is it possible to know which SciPy / NumPy functions run on multiple cores?
And I managed to run this code natively in parallel via MKL:
import numpy
def test():
n = 5000
data = numpy.random.random((n, n))
result = numpy.linalg.inv(data)
test();
But I would need to run something like this in parallel:
N = 1024
A = np.zeros((N,N,N),dtype='float32')
B = np.zeros((N,N,N),dtype='float32')
C = np.zeros((N,N,N),dtype='float32')
A[:,:,:] = 1
B[:,:,:] = 2
# this is the part I want parallel
C[:,:,:] = A[:,:,:]*B[:,:,:]
# also this:
avgC = np.average(C)
Otherwise, what would be the simplest way to paralelize these target operations?