Improve performance of combined for blocks with numpy

Question

I have the rather simple function func1() defined below, mainly composed of two for blocks.

It works perfectly for small N values, but being that the for blocks are combined it grows very quickly to the point of taking minutes when N>1000.

How can I use numpy broadcasting to improve the performance of this function?

import numpy as np
import time as t


def func1(A_triang, B_triang):
    aa = []
    for i, A_tr in enumerate(A_triang):
        for j, B_tr in enumerate(B_triang):
            # Absolute value of differences.
            abs_diff = abs(np.array(A_tr) - np.array(B_tr))
            # Store the sum of the differences and the indexes
            aa.append([sum(abs_diff), i, j])
    return aa


# Generate random data with the proper format
N = 500
A_triang = np.random.uniform(0., 20., (N, 3))
A_triang[:, 0] = np.ones(N)
B_triang = np.random.uniform(0., 20., (N, 3))
B_triang[:, 0] = np.ones(N)

# Call function.
s = t.clock()
aa = func1(A_triang, B_triang)
print(t.clock() - s)

Divakar · Accepted Answer · 2018-01-16T16:33:28.940

Here's one with NumPy broadcasting, leveraging a modified version of indices_merged_arr_generic_using_cp for index assignment part -

import functools

# Based on https://stackoverflow.com/a/46135435/ @unutbu
def indices_merged_arr_generic_using_cp(arr):
    """
    Based on cartesian_product
    http://stackoverflow.com/a/11146645/190597 (senderle)
    """
    shape = arr.shape
    arrays = [np.arange(s, dtype='int') for s in shape]
    broadcastable = np.ix_(*arrays)
    broadcasted = np.broadcast_arrays(*broadcastable)
    rows, cols = functools.reduce(np.multiply, broadcasted[0].shape),\
                                                  len(broadcasted)+1
    out = np.empty(rows * cols, dtype=arr.dtype)
    start, end = rows, 2*rows
    for a in broadcasted:
        out[start:end] = a.reshape(-1)
        start, end = end, end + rows
    out[0:rows] = arr.flatten()
    return out.reshape(cols, rows).T

def func1_numpy_broadcasting(a,b):
    val = np.abs(a[:,None,:] - b).sum(-1)
    return indices_merged_arr_generic_using_cp(val)

If the first column of the inputs are always 1s, then, we won't need to compute their differences as their sum of differences would be zero. So, alternatively to get val, we can simply use the last two columns -

val = np.abs(a[:,1,None] - b[:,1]) + np.abs(a[:,2,None] - b[:,2])

This would save memory as we are not going 3D this way.

Using numexpr module -

import numexpr as ne
def func1_numexpr_broadcasting(a,b):
    a3D = a[:,None,:]
    val = ne.evaluate('sum(abs(a3D - b),2)')
    return indices_merged_arr_generic_using_cp(val)

Leveraging the fact that the first columns are 1s, we would have -

def func1_numexpr_broadcasting_v2(a,b):
    a1 = a[:,1,None]
    b1 = b[:,1]
    a2 = a[:,2,None]
    b2 = b[:,2]
    val = ne.evaluate('abs(a1-b1) + abs(a2-b2)')
    return indices_merged_arr_generic_using_cp(val)

Excellent answer Divakar. I didn't expect the index storing part to be that complicated, but it works. Thank you! — Gabriel, Jan 16 '18 at 17:08

Improve performance of combined for blocks with numpy

1 Answers1