numpy np.apply_along_axis function speed up?

Question

The np.apply_along_axis() function seems to be very slow (no output after 15 mins). Is there a fast way to perform this function on a long array without having to parallelize the operation? I am specifically talking about arrays with millions of elements.

Here is an example of what I am trying to do. Please ignore the simplistic definition of my_func, the goal is not to multiply the array by 55 (which of course can be done in place anyway) but an illustration. In practice, my_func is a little more complicated, takes extra arguments and as a result each element of a is modified differently, i.e. not just multiplied by 55.

>>> def my_func(a):
...     return a[0]*55
>>> a = np.ones((200000000,1))
>>> np.apply_along_axis(my_func, 1, a)

Edit:

a = np.ones((20,1))

def my_func(a, i,j):
...     b = np.zeros((2,2))
...     b[0,0] = a[i]
...     b[1,0] = a[i]
...     b[0,1] = a[i]
...     b[1,1] = a[j]
...     return  linalg.eigh(b)


>>> my_func(a,1,1)
(array([ 0.,  2.]), array([[-0.70710678,  0.70710678],
   [ 0.70710678,  0.70710678]]))

Can you give a *runnable*, copy-paste sample for that `linalg.eigh` version? I'm pretty sure it's vectorizable and would like to try. — Veedrac, May 24 '14 at 22:58
I updated the code above, the goal is to efficiently apply my_func to all the pairs of indices in a. Above my_func is applied to the pair (1,1). — methane, May 24 '14 at 23:31
Apologies, but I meant one using `apply_along_axis` like you're trying to replace. — Veedrac, May 24 '14 at 23:35
I am no longer sure apply_along_axis is the best method to use actually. — methane, May 24 '14 at 23:43
I mean c(i,j) is: for i,j in itertools.combinations(a.size,2): my_func(a,i,j) — methane, May 25 '14 at 00:52
If your array contains millions of elements and your code is possibly written in a way that creates temporary arrays, are you sure the run time of 15 min is not coming from the need to use swap space on the disk rather than in memory? — Midnighter, May 25 '14 at 11:03
I've updated my answer. It seems to me, though, that you want an answer that's over `10⁶ · 10⁶` in size, which is a bit too big to hold in memory no matter how you do it. — Veedrac, May 25 '14 at 16:02
Thanks! Let me look at the answer in more detail first. The answer for a 20,000x20,000 array should be the number of combinations ~ 20,000x20,000/2 (i.e 20,000x20,000/2 pairs of 2x2 eigenvectors and 2x1 eigenvalues) which should fit in memory. — methane, May 25 '14 at 17:09

Veedrac · Answer 1 · 2014-05-25T18:15:06.457

np.apply_along_axis is not for speed.

There is no way to apply a pure Python function to every element of a Numpy array without calling it that many times, short of AST rewriting...

Fortunately, there are solutions:

Vectorizing

Although this is often hard, it's normally the easy solution. Find some way to express your calculation in a way that generalizes over the elements, so you can work on the whole matrix at once. This will result in the loops being hoisted out of Python and in to heavily optimised C and Fortran routines.
JITing: Numba and Parakeet, and to a lesser extent PyPy with NumPyPy

Numba and Parakeet both deal with JITing loops over Numpy data structures, so if you inline the looping into a function (this can be a wrapper function), you can get massive speed boosts for almost-free. This depends on the data structures used, though.
Symbolic evaluators like Theano and numexpr

These allow you to use embedded languages to express calculations, which can end up much faster than even the vectorized versions.
Cython and C extensions

If all else is lost, you can always dig down manually to C. Cython hides a lot of the complexity and has a lot of lovely magic too, so it's not always that bad (although it helps to know what you're doing).

Here you go.

This is my testing "environment" (you should really have provided this :P):

import itertools
import numpy

a = numpy.arange(200).reshape((200,1)) ** 2

def my_func(a, i,j):
    b = numpy.zeros((2,2))
    b[0,0] = a[i]
    b[1,0] = a[i]
    b[0,1] = a[i]
    b[1,1] = a[j]
    return  numpy.linalg.eigh(b)

eigvals = {}
eigvecs = {}

for i, j in itertools.combinations(range(a.size), 2):
    eigvals[i, j], eigvecs[i, j] = my_func(a,i,j)

Now, it's far easier to get all the permutations instead of the combinations, because you can just do this:

# All *permutations*, not combinations
indexes = numpy.mgrid[:a.size, :a.size]

This might seem wasteful, but there are only twice as many permutations so it's not a big deal.

So we want to use these indexes to get the relevant elements:

# Remove the extra dimension; it's not wanted here!
subs = a[:,0][indexes]

and then we can make our matrices:

target = numpy.array([
    [subs[0], subs[0]],
    [subs[0], subs[1]]
])

We need the matrices to be in the last two dimensions:

target.shape
#>>> (2, 2, 200, 200)

target = numpy.swapaxes(target, 0, 2)
target = numpy.swapaxes(target, 1, 3)

target.shape
#>>> (200, 200, 2, 2)

And we can check that it works:

target[10, 20]
#>>> array([[100, 100],
#>>>        [100, 400]])

Yay!

So then we just run numpy.linalg.eigh:

values, vectors = numpy.linalg.eigh(target)

And look, it works!

values[10, 20]
#>>> array([  69.72243623,  430.27756377])

eigvals[10, 20]
#>>> array([  69.72243623,  430.27756377])

So then I'd imagine you might want to concatenate these:

numpy.concatenate([values[row, row+1:] for row in range(len(values))])
#>>> array([[  0.00000000e+00,   1.00000000e+00],
#>>>        [  0.00000000e+00,   4.00000000e+00],
#>>>        [  0.00000000e+00,   9.00000000e+00],
#>>>        ..., 
#>>>        [  1.96997462e+02,   7.78160025e+04],
#>>>        [  3.93979696e+02,   7.80160203e+04],
#>>>        [  1.97997475e+02,   7.86070025e+04]])

numpy.concatenate([vectors[row, row+1:] for row in range(len(vectors))])
#>>> array([[[ 1.        ,  0.        ],
#>>>         [ 0.        ,  1.        ]],
#>>> 
#>>>        [[ 1.        ,  0.        ],
#>>>         [ 0.        ,  1.        ]],
#>>> 
#>>>        [[ 1.        ,  0.        ],
#>>>         [ 0.        ,  1.        ]],
#>>> 
#>>>        ..., 
#>>>        [[-0.70890372,  0.70530527],
#>>>         [ 0.70530527,  0.70890372]],
#>>> 
#>>>        [[-0.71070503,  0.70349013],
#>>>         [ 0.70349013,  0.71070503]],
#>>> 
#>>>        [[-0.70889463,  0.7053144 ],
#>>>         [ 0.7053144 ,  0.70889463]]])

It's also possible to do this concatenate loop just after numpy.mgrid to halve the amount of work:

# All *permutations*, not combinations
indexes = numpy.mgrid[:a.size, :a.size]

# Convert to all *combinations* and reduce the dimensionality
indexes = numpy.concatenate([indexes[:, row, row+1:] for row in range(indexes.shape[1])], axis=1)

# Remove the extra dimension; it's not wanted here!
subs = a[:,0][indexes]

target = numpy.array([
    [subs[0], subs[0]],
    [subs[0], subs[1]]
])

target = numpy.rollaxis(target, 2)

values, vectors = numpy.linalg.eigh(target)

Yeah, that last sample is all you need.

Thanks! Do you have a good reference link for using JITing? Or can provide an example? — methane, May 24 '14 at 20:30
it seems like target is 3 dimensional, not 4, when i run the code above. Let me look where the problem might be. — methane, May 25 '14 at 17:46
In the first part I refer to `target` as a 4D array, where 2 dimensions specify `subs[0]` and `subs[1]`, and the second ("It's also possible to do this concatenate...") has it as a 3D array where one axis specifies *both* `subs[0]` and `subs[1]`. — Veedrac, May 25 '14 at 18:12
I just realised a comment in the second variant was wrong, which might have been confusing you. I've updated. — Veedrac, May 25 '14 at 18:15
Thanks! Weird but it seems like linalg.eigh does not work with a 3-d array for me even though the documentation says it should take A : (..., M, M) array. Let me double check. I am just using the last snippet of code right now. — methane, May 25 '14 at 18:21
Are you sure? Try `numpy.linalg.eigh(numpy.array([[[0]]]))` for a minimal example. — Veedrac, May 25 '14 at 18:24
I think working with stacked arrays is new in numpy 1.8, no? I have numpy 1.7.1 and it seems like linalg.eigh only takes 2D arrays: http://docs.scipy.org/doc/numpy-1.7.0/reference/generated/numpy.linalg.eigh.html#numpy.linalg.eigh — methane, May 25 '14 at 18:26
yup that seems to be the issue, i updated and it works great. Let me play around with the code a bit before I approve your answer. Thanks for the patience and the help!!! — methane, May 25 '14 at 18:30
Great. I've just seen Bort's edit, so don't forget to give him love too, even if it's not the answer you're sticking with. — Veedrac, May 25 '14 at 18:33
yes! I was just going to start looking at their answer now that I've figured out yours. It's interesting to see the different answers and I am actually curious to hear your opinion about the alternative! Thanks! — methane, May 25 '14 at 18:36

Bort · Answer 2 · 2014-05-25T11:10:10.360

and as a result each element of a is modified differently

If there is no connection between the array elements then Veedracs answer summarizes the typical strategies. However more often than not, a vectorization scheme can be found which massively speed up computations. If you provide the relevant code snippets of the function itself, we can give you a much more helpful answer.

Edit:

The following code illustrates how one can vectorize your sample function. Although it is not complete (block matrix and eigenvalues retrieval), it should provide you with some basic ideas how one can do that. Have a look at each matrix and submatrix in the function to see how one can setup such a calculation. Additionally, I used dense matrices which will most likely not fit into memory when using millions of elements in a and a large number of index pairs. But most matrices during the calculation are sparse. So you can always convert the code to use sparse matrices. The function now takes the vector a and a vector of index pairs.

import numpy as np

def my_func(a,pairs):
    #define mask matrix
    g=np.zeros((4,2))
    g[:3,0]=1
    g[3,1]=1

    # k is the number of index pairs which need calculation
    k=pairs.shape[0]
    h=np.kron(np.eye(k),g)
    b=np.dot(h,a[pairs.ravel()[:2*k]]) # this matrix product generates your matrix b
    b.shape=-1,2

    out = np.zeros((2*k,2*k)) # pre allocate memory of the block diagonal matrix

    # make block diagonal matrix
    for i in xrange(k):
        out[i*2:(i+1)*2, i*2:(i+1)*2] = b[i*2:(i+1)*2,:]

    res = np.linalg.eigh(out) # the eigenvalues of each 2by2 matrix are the same as the ones of one large block diagonal matrix
    # unfortunately eigh sorts the eigenvalues
    # to retrieve the correct pairs of eigenvalues
    # for each submatrix b, one has to inspect res[1] and pick
    # corresponding eigenvalues 
    # I leave that for you, remember out=res[1] diag(res[0]) res[1].T
    return res

#vektor a
a=np.arange(20)
#define index pairs for calculation
pairs=np.asarray([[1,3],[2,7],[1,7],[2,3]])
print my_func(a,pairs)

Just updated the question. In case it's not clear, the idea is that each element of a is used together with another element of a to form a small 2x2 matrix (i.e. b). — methane, May 24 '14 at 20:52

numpy np.apply_along_axis function speed up?

2 Answers2

Linked