Apply function along axis over two numpy arrays - shapes not aligned

Question

I'm probably not seeing something obvious here but don't believe np.apply_along_axis or np.apply_over_axes is what I'm looking for. Say I have the following two arrays:

arr1 = np.random.randn(10, 5)
arr2 = np.random.randn(10, )

And the following function:

def coefs(x, y):
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))
    # the vector of coefficients in a multiple linear regression

Calling this on arr1 and arr2 works smoothly as it should:

coefs(arr1, arr2)
Out[111]: array([-0.19474836, -0.50797551,  0.82903805,  0.06332607, -0.26985597])

However, suppose instead of the 1- or 2d arrays I have two 3d arrays:

arr3 = np.array([arr1[:-1], arr1[1:]])
arr4 = np.array([arr2[:-1], arr2[1:]])

As expected, if I apply the function here I get

coefs(arr3, arr4)
Traceback (most recent call last):

  File "<ipython-input-127-4a3e7df02cda>", line 1, in <module>
    coefs(arr3, arr4)

  File "<ipython-input-124-7532b8516784>", line 2, in coefs
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))

ValueError: shapes (5,9,2) and (2,9,5) not aligned: 2 (dim 2) != 9 (dim 1)

...because NumPy is treating each array as an object as it should. What I want to do instead is apply the coefs() function to each of the 2 elements along the 0 axis of the arrays, element-wise. Here's a crude way of doing this:

tgt = []
for i, j in zip(arr3, arr4):
    tgt.append(coefs(i, j))

np.array(tgt) 
Out[136]: 
array([[-0.34328006, -0.99116672,  1.42757897, -0.06687851, -0.44669182],
       [ 0.44494495, -0.58017705,  0.75825944,  0.18795889,  0.4560851 ]])

My question is, is there a more efficient and pythonic way of doing this than using zip and iterating over, as above? Basically, given two input arrays of shape (2, n, k) and (2, n), I want the array that is returned to be of shape (2, k). Thanks.

Would the first axis length of the `3D` arrays always be `2` like you have for `arr3` and for the `2D` array `arr4`? — Divakar, May 15 '17 at 21:00
@Divakar Short answer is no, I'd like to generalize this just so that the resulting array's first axis length maintains the first axis length of arr3 and arr4. — Brad Solomon, May 16 '17 at 13:12
@BradSolomon Kool. Check out the posted solution that achieves the same? — Divakar, May 16 '17 at 13:15

score 1 · Accepted Answer · edited May 23 '17 at 10:31

For generic shaped 3D and 2D arrays - arr3 and arr4, we can use some np.einsum magic to have a vectorized solution, like so -

dot1 = np.einsum('ijk,ijl->ikl',arr3,arr3)
dot2 = np.einsum('ijk,ij->ik',arr3,arr4)
inv1 = np.linalg.inv(dot1)
tgt_out = np.einsum('ijk,ij->ik',inv1, dot2)

Runtime test

Approaches -

def org_app(arr3, arr4):
    tgt = []
    for i, j in zip(arr3, arr4):
        tgt.append(coefs(i, j))
    return np.array(tgt)

def einsum_app(arr3, arr4):
    dot1 = np.einsum('ijk,ijl->ikl',arr3,arr3)
    dot2 = np.einsum('ijk,ij->ik',arr3,arr4)
    inv1 = np.linalg.inv(dot1)
    return np.einsum('ijk,ij->ik',inv1, dot2)

Timings and verification -

In [215]: arr3 = np.random.rand(50,50,50)
     ...: arr4 = np.random.rand(50,50)
     ...: 

In [216]: np.allclose(org_app(arr3, arr4), einsum_app(arr3, arr4))
Out[216]: True

In [217]: %timeit org_app(arr3, arr4)
100 loops, best of 3: 4.82 ms per loop

In [218]: %timeit einsum_app(arr3, arr4)
100 loops, best of 3: 19.7 ms per loop

Doesn't look like einsum is giving us any benefits here. This is expected because basically einsum is fighting it out against np.dot, which is much better at sum-reduction and even though we are using it in a loop. The only situation/case in which we can give np.dot a fight is when we loop enough and that should make einsum competitive. We are looping for times equal to the length equal of the first axis of the input arrays. Let's increase it and test again -

In [219]: arr3 = np.random.rand(1000,10,10)
     ...: arr4 = np.random.rand(1000,10)
     ...: 

In [220]: %timeit org_app(arr3, arr4)
10 loops, best of 3: 23 ms per loop

In [221]: %timeit einsum_app(arr3, arr4)
100 loops, best of 3: 9.1 ms per loop

einsum definitely winning on this one!

This related post on the fight between np.einsum and np.dot is worth a look.

Also, note that if we need to use the loop based approach, we should look to initialize the output array and then assign the output values from coefs into it rather than appending, as the latter is a slow process.

Apply function along axis over two numpy arrays - shapes not aligned

1 Answers1

Linked