I have a lot of very large matrices AFeatures
that I am comparing against some other very large matrices BFeatures
, both of which have a shape of (878, 2, 4, 15, 17, 512)
, using the Euclidean distance. I am trying to parallelise this process to speed up the comparison.
I am using Python 3 in a Conda environment and my original code uses an average of two CPU cores at 100%:
per_slice_comparisons = np.zeros(shape=(878, 878, 2, 4))
for i in range(878):
for j in range(878):
for k in range(2):
for l in range(4):
per_slice_comparisons[i, j, k, l] = np.linalg.norm(AFeatures[i, k, l, :] - BFeatures[j, k, l, :])
I have tried two approaches for speeding up the code.
Using multi-processing
def fill_array(i): comparisons = np.zeros(shape=(878, 2, 4)) for j in range(878): for k in range(2): for l in range(4): comparisons[j, k, l] = np.linalg.norm(AFeatures[i, k, l, :] -BFeatures[j, k, l, :]) comparisons[j, k, l] = 0 return comparisons pool = Pool(processes=6) list_start_vals = range(878) per_slice_comparisons = np.array(pool.map(fill_array, list_start_vals)) pool.close()
This approach increases run time by around 5%, although all 8 CPU cores are now being used at 100%. I have tried a number of different processes, the more there are the slower it gets.
This is a slightly different approach where I use the numexpr library to do a faster linal.norm operation. For a single operation this approach reduces runtime by a factor of 10.
os.environ['NUMEXPR_MAX_THREADS'] = '8' os.environ['NUMEXPR_NUM_THREADS'] = '4' import numexpr as ne def linalg_norm(a): sq_norm = ne.evaluate('sum(a**2)') return ne.evaluate('sqrt(sq_norm)') per_slice_comparisons = np.zeros(shape=(878, 878, 2, 4)) for i in range(878): for j in range(878): for k in range(2): for l in range(4): per_slice_comparisons[i, j, k, l] = linalg_norm(AFeatures[i, k, l, :] - BFeatures[j, k, l, :])
However, for a nested for loop this approach increases total execution time by a factor of 3. I don't understand why simply putting this operation in a nested for loop would decrease performance so dramatically? If anyone has any ideas on how to fix this I would really appreciate it!