numpy large matrix multiplication optimize

Question

I have to do a iterative calculation with large matrix: R(t) = M @ R(t-1), where M is n x n, and R is n x 1

if I write this:

for _ in range(iter_num):
    R = M @ R

I suppose it will be very slow, because it has to copy and create new array each time. Is that any way to optimize this? (maybe do it inplace?)

As you see below the answer strongly depends on what `n` and `iter_num` in your case are. — Joe, Feb 15 '18 at 08:14

Paul Panzer · Accepted Answer · 2018-02-15T08:38:43.780

A few timings to show that OP's approach is actually quite competitive:

>>> import functools as ft
>>> kwds = dict(globals=globals(), number=1000)
>>> R = np.random.random((200,))
>>> M = np.random.random((200, 200))
>>> def f_op(M, R):
...     for i in range(k):
...         R = M@R
...     return R
... 
>>> def f_pp(M, R):
...     return ft.reduce(np.matmul, (R,) + k * (M.T,))
... 
>>> def f_ag(M, R):
...     return np.linalg.matrix_power(M, k)@R
... 
>>> def f_tai(M, R):
...     return np.linalg.multi_dot([M]*k+[R])
... 
>>> k = 20
>>> repeat('f_op(M, R)', **kwds)
[0.14156094897771254, 0.1264056910004001, 0.12611976702464744]
>>> repeat('f_pp(M, R)', **kwds)
[0.12594187198556028, 0.1227772050187923, 0.12045996301458217]
>>> repeat('f_ag(M, R)', **kwds)
[2.065609384997515, 2.041590739012463, 2.038702343008481]
>>> repeat('f_tai(M, R)', **kwds)
[3.426795684004901, 3.4321794749994297, 3.4208814119920135]
>>>
>>> k = 500
>>> repeat('f_op(M, R)', **kwds)
[3.066054102004273, 3.0294102499901783, 3.020273027010262]
>>> repeat('f_pp(M, R)', **kwds)
[2.891954762977548, 2.8680382019956596, 2.8558325179910753]
>>> repeat('f_ag(M, R)', **kwds)
[5.216210452985251, 5.1636185249954, 5.157578871003352]

Tai · Answer 2 · 2018-02-15T08:47:08.430

2

Using numpy.linalg.multi_dot

np.linalg.multi_dot([M] * iter_num + [R])

([M] * iter_num creates a list of references to M.)

Some thoughts mentioned in the documentation,

(multi_dot) Compute the dot product of two or more arrays in a single function call, while automatically selecting the fastest evaluation order.

and

Think of multi_dot as:

def multi_dot(arrays): return functools.reduce(np.dot, arrays)

Note OP's method is actually quite fast. See Paul Panzer's answer for more timing results.

Thanks for Paul Panzer's suggestion for using reference rather than view.

edited Feb 15 '18 at 08:47

answered Feb 15 '18 at 07:22

Tai

7,684
3
29
49

1

`view` is unnecessary, just do `10 * [M]` or something like that. The one thing that is cheaper than a view is a reference. – Paul Panzer Feb 15 '18 at 07:27
@PaulPanzer Thank you. I was thinking about creating a reference but did not come up with the right way to do it. – Tai Feb 15 '18 at 07:31
@Joe what's the size of a matrix that you tried? Guess you can see the speed difference if you try a bigger matrix. – Tai Feb 15 '18 at 08:05
50 x 50, could be the problem. Could you also time `np.dot(np.linalg.matrix_power(M, 10), R)` or is this supposed to be the same than @? – Joe Feb 15 '18 at 08:07
@Joe You can try 500 by 500. Indeed, when the size is smaller `matrix_power` method is faster. – Tai Feb 15 '18 at 08:08
Jup, I see, and it also depends on `iter_num`, for only a few iterations (50) your approach is faster, but for 200 on my machine `matrix_power` is faster. Probably because of the iteration. Can you verify this? – Joe Feb 15 '18 at 08:13
@Joe I will do it now – Tai Feb 15 '18 at 08:13
1

I guess what makes `multidot` slow is that it has to work out what the best order of multiplications is. Since we already know it's from right to left, we can bypass this. Indeed OP's loop is significantly faster than either of the other methods. – Paul Panzer Feb 15 '18 at 08:18
@Joe you are right. It is slow when `iter_num` is 200. – Tai Feb 15 '18 at 08:19
1

Very strongly depents on what OP chooses for `n` and `iter_num` :) – Joe Feb 15 '18 at 08:25
@Joe Really? The only way I can see would be to make `iter_num` insanely large (or perhaps `n` ridiculously small). In which case one could start thinking about eigen decomposition. – Paul Panzer Feb 15 '18 at 08:46

Aguy · Answer 3 · 2018-02-15T07:08:25.347

1

Would this work for you?

R_final = np.linalg.matrix_power(M, iter_num) @ R

It seems like you are doing M @ M @ M @ ... @ M @ R, which can be cast to M ** iter_num @ R

edited Feb 15 '18 at 07:08

answered Feb 15 '18 at 07:00

Aguy

7,851
5
31
58

1

Unless `iter_num` is pretty large that would actually be more expensive because even a single `M@M` costs O(n^3) while each M@R(t) is O(n^2) – Paul Panzer Feb 15 '18 at 07:13
What's "pretty large"? On my machine for a 100x100 Matrix, even for iter_num=10 power wins over dot. – Aguy Feb 15 '18 at 07:38
@PaulPanzer I haven't checked the actual code, but if `matrix_power` is doing a spectral decomposition it will be much faster for high powers as it's only raising the eignvalues to the x-th power intestead of doing repeated `@` operations – Daniel F Feb 15 '18 at 13:11
@DanielF But calculating the full spectral decomp is really expensive. Besides, a quick google for algorithms yields mostly "power iteration" unless it's a special matrix. My guess would be that `matrix_power` does something more modest like computing M, M^2, M^4, M^8 and then piecing them together as needed. – Paul Panzer Feb 15 '18 at 14:02
@PaulPanzer Yeah, and I suppose if they were doing spectral decomposition they wouldn't need to limit `n` to integers. – Daniel F Feb 15 '18 at 14:11

Daniel F · Answer 4 · 2018-02-15T14:11:12.983

1

Using explicit spectral decomposition my be useful if iter_num is large compared to n (assuming np.lialg.matrix_power doesn't do this already) and M is invertible:

def mat_pow(a, p):
    vals, vecs = np.linalg.eig(a)
    return vecs @ np.diag(vals**p) @ vecs.T

mat_pow(M, iter_num) @ R

If M is symmetric, you could use the even faster np.linalg.eigh

edited Feb 15 '18 at 14:11

answered Feb 15 '18 at 13:25

Daniel F

13,620
2
29
55

Actually, one could use the principle behind power iteration algorithms to optimize the large `iter_num` case: With every iteration the eigenvector with largest eigenvalue ev_max will become more dominant as a component of M^k R. So after some iterations R(t+1) will just be ev_max R(t). We just need to check for that every now and then in the iteration and once it holds with enough accuracy we are basically done. – Paul Panzer Feb 15 '18 at 14:52
@PaulPanzer I wish I could favorite a comment, that gives me an idea that I have absolutely no time to explore right now >. – Daniel F Feb 16 '18 at 07:07

numpy large matrix multiplication optimize

4 Answers4

Linked