speeding up numpy kronecker products

Question

I am working on my first large python project. I have one function which has the following code in it:

            # EXPAND THE EXPECTED VALUE TO APPLY TO ALL STATES,
            # THEN UPDATE fullFnMat
            EV_subset_expand = np.kron(EV_subset, np.ones((nrows, 1)))
            fullFnMat[key] = staticMat[key] + EV_subset_expand

In my code profiler, it seems like this kronecker product is actually taking up a huge amount of time.

Function                                                                                        was called by...
                                                                                                    ncalls  tottime  cumtime
/home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/BellmanEquation.py:17(bellmanFn)    <-      19   37.681   38.768  /home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/dpclient.py:467(solveTheModel)
{numpy.core.multiarray.concatenate}                                                             <-     342   27.319   27.319  /usr/lib/pymodules/python2.7/numpy/lib/shape_base.py:665(kron)
/home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/dpclient.py:467(solveTheModel)      <-       1   11.041   91.781  <string>:1(<module>)
{method 'argsort' of 'numpy.ndarray' objects}                                                   <-      19    7.692    7.692  /usr/lib/pymodules/python2.7/numpy/core/fromnumeric.py:597(argsort)
/usr/lib/pymodules/python2.7/numpy/core/numeric.py:789(outer)                                   <-     171    2.526    2.527  /usr/lib/pymodules/python2.7/numpy/lib/shape_base.py:665(kron)
{method 'max' of 'numpy.ndarray' objects}                                                       <-     209    2.034    2.034  /home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/dpclient.py:391(getValPolMatrices)

Is there a way to get faster kronecker products in Numpy? It seems like it shouldn't take as long as it is.

Downvotes? Serious? I think this is a pretty legitimate question. I'm not trolling / trying to start a flame war / anything. Just trying to make my code faster... — stevejb, Aug 25 '11 at 16:50
Wow that is a long time - do you have a fair number of dimensions and large overall size? It seems to work by using 'outer' to create the values you want, but with twice as many dims as you want; and then by using calls to 'concatenate' to eliminate these dims one at a time, with large amounts of copies at each step. Would be better to allocate the final array using empty(), create a reshaped view of that array with 2* dims, and just do result_view[:] = outerprod to get the whole reshape in one step. Or even just transpose followed by reshape() ? — greggo, Jul 30 '14 at 20:31

JoshAdel · Accepted Answer · 2011-08-25T19:58:25.397

You can certainly take a look at the source for np.kron. It can be found in numpy/lib/shape_base.py, and you can see if there are improvements that can be made or simplifications that might make it more efficient. Alternatively you could write your own using Cython or some other binding to a low level language to try to eek out better performance.

Or as @matt suggested something like the following might be natively faster:

import numpy as np
nrows = 10
a = np.arange(100).reshape(10,10)
b = np.tile(a,nrows).reshape(nrows*a.shape[0],-1) # equiv to np.kron(a,np.ones((nrows,1)))

or:

b = np.repeat(a,nrows*np.ones(a.shape[0],np.int),axis=0)

Timings:

In [80]: %timeit np.tile(a,nrows).reshape(nrows*a.shape[0],-1)
10000 loops, best of 3: 25.5 us per loop

In [81]: %timeit np.kron(a,np.ones((nrows,1)))
10000 loops, best of 3: 117 us per loop

In [91]: %timeit np.repeat(a,nrows*np.ones(a.shape[0],np.int),0)
100000 loops, best of 3: 12.8 us per loop

Using np.repeat for the sized arrays in the above example gives a pretty nice 10x speed-up, which isn't too shabby.

Thanks Josh. I think that the usage np.repeat you created will work perfectly. Otherwise, I can do it in Cython. — stevejb, Aug 25 '11 at 20:52

greggo · Answer 2 · 2014-07-30T20:51:21.127

The following may help (in the general case when one of the arrays is not 'ones'). Example is for two arrays A,B of shape (a,b,c) and (d,e,f); Generalize as needed.

Gets it done in a single 'multiply' op and a quick reshape.

kprod = A[:,newaxis,:,newaxis,:,newaxis] * B[newaxis,:, newaxis,:, newaxis,:]
#
# kprod.shape = (a,d,b,e,c,f) now; is full outer product with desired arrangement
# in memory.
kprod.shape = (a*d,b*e,c*f)  # reshape 'in place'

(maybe this is kron(B,A) instead of kron(A,B); reverse A & B if needed)

score 2 · Answer 3 · answered Aug 25 '11 at 18:36

2

Maybe np.kron() is allocating memory and then you're throwing it away. Try using np.tile() instead. I don't know if that allocates more memory or plays indexing tricks under the covers. If you're only multiplying EV_subset by ones, you don't really need to call np.kron().

answered Aug 25 '11 at 18:36

matt

4,089
1
20
17

Yeah, it makes sense that all of the *1's are redundant. Thanks – stevejb Aug 25 '11 at 20:54

speeding up numpy kronecker products

3 Answers3

Linked