NumPy Matrix Multiplication Efficiency for Matrix With Known Structure

Question

I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:

import numpy as np
C = np.dot(A, B)

However, I happen to know that for matrix B only row n and column n are non-zero (this comes directly from the analytical formula that produced the matrix and is without a doubt always the case).

Hoping to take advantage of this fact and reduce the number of multiplications needed to produce C, I replaced the above with:

import numpy as np
for row in range(0, N):
    for col in range(0, N):
        if col != n:
            C[row, col] = A[row, n]*B[n, col]    #Just one scalar multiplication
        else:
            C[row, col] = np.dot(A[row], B[:, n])

Analytically, this should reduce the total complexity as follows: In the general case (not using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O(N^3). That is, all N rows must multiply all N columns, and each of these dot products contains N multiplications => O(NNN) = O(N^3).#

Exploiting the structure of B as I've done above however should go as O(N^2 + N^2) = O(2N^2) = O(N^2). That is, All N rows must multiply all N columns, however, for all of these (except those involving 'B[:, n]') only one scalar multiplication is required: only one element of 'B[:, m]' is non-zero for m != n. When n == m, which will occur N times (once for each row of A that must multiply column n of B), N scalar multiplications must occur.#

However, the first block of code (using np.dot(A, B)) is substantially faster. I'm aware (via information like: Why is matrix multiplication faster with numpy than with ctypes in Python?) that the low level implementation details of np.dot are likely to blame for this. So my question is this: How can I exploit the structure of matrix B to improve multiplication efficiency without sacrificing the implementation efficiency of NumPy, without building my own low level matrix multiplication in c?

This method is part of a numerical optimization over many variables, hence, O(N^3) is intractable whereas O(N^2) will likely get the job done.

Thank you for any help. Also, I'm new to SO, so please pardon any newbie errors.

Have you considered `cython` or some other way of compiling your multiplication function directly into machine code? In the good ole' days, I probably would have used `f2py` for this, but I know that not everyone wants to write code in fortran ;-) — mgilson, Dec 09 '13 at 00:48
I'm also not completely sure about this, but scipy might have solved some similar problem using sparse matrices. Any scipy gurus know? — mgilson, Dec 09 '13 at 00:50
Take a look at `scipy.sparse`, You can make `B` a sparse matrix `B = scipy.sparse.csr_matrix(B)` and then just do `A * B`, if you multiply dense by sparse the result is dense. My gut feeling is that this is more efficient by I have not tested it. — Akavall, Dec 09 '13 at 00:56
Thanks for the quick replies guys! Akavall, I'll look up 'scipy.sparse' First I have to confirm that A*B where B is of type scipy.sparse.csr_matrix gives the same result as np.dot(A, B) and if it's faster, then great! I'm still open to other methods in case either equality or efficiency there don't pan out. — NLi10Me, Dec 09 '13 at 01:11

behzad.nouri · Accepted Answer · 2013-12-09T01:47:35.193

If I understood A and B correctly, then I do not understand the for loops and why you are not just multiplying by the two non-zero vectors:

# say A & B are like this:
n, N = 3, 5
A = np.array( np.random.randn(N, N ) )

B = np.zeros_like( A )
B[ n ] = np.random.randn( N )
B[:, n] = np.random.randn( N )

take the non-zero row and column of B:

rowb, colb = B[n,:], np.copy( B[:,n] )
colb[ n ] = 0

multiply A by those two vector:

X = np.outer( A[:,n], rowb )
X[:,n] += np.dot( A, colb )

to verify check:

X - np.dot( A, B )

with N=100:

%timeit np.dot(A, B)
1000 loops, best of 3: 1.39 ms per loop

%timeit colb = np.copy( B[:,n] ); colb[ n ] = 0; X = np.outer( A[:,n], B[n,:] ); X[:,n] += np.dot( A, colb )
10000 loops, best of 3: 98.5 µs per loop

Aha! I believe you are correct, it is not necessary for me to do the scalar multiplications manually! So, I didn't simplify the math/theory enough before implementing. Thank you for your insight! — NLi10Me, Dec 09 '13 at 18:28

Akavall · Answer 2 · 2013-12-09T04:56:35.533

1

I timed it, and using sparse is faster:

import numpy as np
from scipy import sparse

from timeit import timeit

A = np.random.rand(100,100)
B = np.zeros(A.shape, dtype=np.float)

B[3] = np.random.rand(100)
B[:,3] = np.random.rand(100)

sparse_B = sparse.csr_matrix(B)

n = 1000

t1 = timeit('np.dot(A, B)', 'from __main__ import np, A, B', number=n)
print 'dense way : {}'.format(t1)
t2 = timeit('A * sparse_B', 'from __main__ import A, sparse_B',number=n)
print 'sparse way : {}'.format(t2)

Result:

dense way : 1.15117192268
sparse way : 0.113152980804
>>> np.allclose(np.dot(A, B), A * sparse_B)
True

As number of rows of B increases, so should the time advantage of multiplication using sparse matrix.

edited Dec 09 '13 at 04:56

answered Dec 09 '13 at 02:00

Akavall

82,592
51
207
251

This is great thanks! I'm noticing the solution above is slightly quicker and doesn't require the extra (sparse) library, however this solution is more flexible. Also, the solution above really just points out a flaw in my implementation as opposed to a direct solution to the original problem which this is closer to. Thanks! – NLi10Me Dec 09 '13 at 18:33

NumPy Matrix Multiplication Efficiency for Matrix With Known Structure

2 Answers2