14

I have 2 matrix 100kx200 and 200x100k if they were small matrix I would just use numpy dot product

sum(a.dot(b), axis = 0)

however the matrix is too big, and also I can't use loops is there a smart way for doing this?

kmario23
  • 57,311
  • 13
  • 161
  • 150
Aya Abdelsalam
  • 502
  • 3
  • 8
  • 21

3 Answers3

18

A possible optimization is

>>> numpy.sum(a @ b, axis=0)
array([  1.83633615,  18.71643672,  15.26981078, -46.33670382,  13.30276476])
>>> numpy.sum(a, axis=0) @ b
array([  1.83633615,  18.71643672,  15.26981078, -46.33670382,  13.30276476])

Computing a @ b requires 10k×200×10k operations, while summing the rows first will reduce the multiplication to 1×200×10k operations, giving a 10k× improvement.

This is mainly due to recognizing

   numpy.sum(x, axis=0) == [1, 1, ..., 1] @ x
=> numpy.sum(a @ b, axis=0) == [1, 1, ..., 1] @ (a @ b)
                            == ([1, 1, ..., 1] @ a) @ b
                            == numpy.sum(a, axis=0) @ b

Similar for the other axis.

>>> numpy.sum(a @ b, axis=1)
array([  2.8794171 ,   9.12128399,  14.52009991,  -8.70177811, -15.0303783 ])
>>> a @ numpy.sum(b, axis=1)
array([  2.8794171 ,   9.12128399,  14.52009991,  -8.70177811, -15.0303783 ])

(Note: x @ y is equivalent to x.dot(y) for 2D matrixes and 1D vectors on Python 3.5+ with numpy 1.10.0+)


$ INITIALIZATION='import numpy;numpy.random.seed(0);a=numpy.random.randn(1000,200);b=numpy.random.rand(200,1000)'

$ python3 -m timeit -s "$INITIALIZATION" 'numpy.einsum("ij,jk->k", a, b)'
10 loops, best of 3: 87.2 msec per loop

$ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a@b, axis=0)'
100 loops, best of 3: 12.8 msec per loop

$ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a, axis=0)@b'
1000 loops, best of 3: 300 usec per loop

Illustration:

In [235]: a = np.random.rand(3,3)
array([[ 0.465,  0.758,  0.641],
       [ 0.897,  0.673,  0.742],
       [ 0.763,  0.274,  0.485]])

In [237]: b = np.random.rand(3,2)
array([[ 0.303,  0.378],
       [ 0.039,  0.095],
       [ 0.192,  0.668]])

Now, if we simply do a @ b, we would need 18 multiply and 6 addition ops. On the other hand, if we do np.sum(a, axis=0) @ b we would only need 6 multiply and 2 addition ops. An improvement of 3x because we had 3 rows in a. As for OP's case, this should give 10k times improvement over simple a @ b computation since he has 10k rows in a.

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
5

There are two sum-reductions happening - One from the marix-multilication with np.dot, and then with the explicit sum.

We could use np.einsum to do both of those in one go, like so -

np.einsum('ij,jk->k',a,b)

Sample run -

In [27]: a = np.random.rand(3,4)

In [28]: b = np.random.rand(4,3)

In [29]: np.sum(a.dot(b), axis = 0)
Out[29]: array([ 2.70084316,  3.07448582,  3.28690401])

In [30]: np.einsum('ij,jk->k',a,b)
Out[30]: array([ 2.70084316,  3.07448582,  3.28690401])

Runtime test -

In [45]: a = np.random.rand(1000,200)

In [46]: b = np.random.rand(200,1000)

In [47]: %timeit np.sum(a.dot(b), axis = 0)
100 loops, best of 3: 5.5 ms per loop

In [48]: %timeit np.einsum('ij,jk->k',a,b)
10 loops, best of 3: 71.8 ms per loop

Sadly, doesn't look like we are doing any better with np.einsum.


For changing to np.sum(a.dot(b), axis = 1), just swap the output string notation there - np.einsum('ij,jk->i',a,b), like so -

In [42]: np.sum(a.dot(b), axis = 1)
Out[42]: array([ 3.97805141,  3.2249661 ,  1.85921549])

In [43]: np.einsum('ij,jk->i',a,b)
Out[43]: array([ 3.97805141,  3.2249661 ,  1.85921549])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thank you if I wanted to make it for axis=1 what would I change? – Aya Abdelsalam Mar 09 '17 at 09:11
  • I tried it it has been running for over 5 mins now I guess its slow when it comes to big matrix – Aya Abdelsalam Mar 09 '17 at 09:28
  • @AyaAbdelsalam Yup! That's what I found out. – Divakar Mar 09 '17 at 09:29
  • `ij,jk->k` proves the winning solution. We can sum on `i` first, then do the `j` dot. I wonder how `np.einsum('j,jk->k', np.einsum('ij->j', a), b)` would do. – hpaulj Mar 09 '17 at 17:28
  • @hpaulj I think `np.einsum('j,jk->k',` part is better off as a matrix multiplication with `dot` as done in the other solution. – Divakar Mar 09 '17 at 17:30
  • Yes, the `dot` helps a bit. Seems the big time savings is the `ij->j` step; `einsum` is faster than `np.sum`. However the OP's arrays are 100x bigger, and `einsum` often falls behind when arrays get really big. – hpaulj Mar 09 '17 at 17:43
2

Some quick time tests using the idea I added to Divakar's answer:

In [162]: a = np.random.rand(1000,200)
In [163]: b = np.random.rand(200,1000)

In [174]: timeit c1=np.sum(a.dot(b), axis=0)
10 loops, best of 3: 27.7 ms per loop

In [175]: timeit c2=np.sum(a,axis=0).dot(b)
1000 loops, best of 3: 432 µs per loop

In [176]: timeit c3=np.einsum('ij,jk->k',a,b)
10 loops, best of 3: 170 ms per loop

In [177]: timeit c4=np.einsum('j,jk->k', np.einsum('ij->j', a), b)
1000 loops, best of 3: 353 µs per loop

In [178]: timeit np.einsum('ij->j', a) @b
1000 loops, best of 3: 304 µs per loop

einsum is actually faster than np.sum!

In [180]: timeit np.einsum('ij->j', a)
1000 loops, best of 3: 173 µs per loop
In [181]: timeit np.sum(a,0)
1000 loops, best of 3: 312 µs per loop

For larger arrays the einsum advantage decreases

In [183]: a = np.random.rand(100000,200)
In [184]: b = np.random.rand(200,100000)
In [185]: timeit np.einsum('ij->j', a) @b
10 loops, best of 3: 51.5 ms per loop
In [186]: timeit c2=np.sum(a,axis=0).dot(b)
10 loops, best of 3: 59.5 ms per loop
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • For your first set, I am getting with `np.einsum('ij->j', a)` : `96.8 µs` and with `np.sum(a,0)` : `74.8 µs`. I am on `NumPy 1.12.0`. I think I remember seeing `einsum` being faster than `sum` but on previous versions of NumPy, if I remember correctly. – Divakar Mar 09 '17 at 18:44
  • Same numpy version here. – hpaulj Mar 09 '17 at 19:02