I've recently been working on a project where a majority of my time is spent multiplying a dense matrix A
and a sparse vector v
(see here). In my attempts to reduce computation, I've noticed that the runtime of A.dot(v)
is not affected by the number of zero entries of v
.
To explain why I would expect the runtime to improve in this case, let result = A.dot.v
so that result[j] = sum_i(A[i,j]*v[j]) for j = 1...v.shape[0]
. If v[j] = 0
then clearly result[j] = 0
no matter the values A[::,j]
. In this case, I would therefore expect numpy to just set result[j] = 0
but it seems as if it goes ahead and computes sum_i(A[i,j]*v[j])
anyways.
I went ahead and wrote a short sample script to confirm this behavior below.
import time
import numpy as np
np.__config__.show() #make sure BLAS/LAPACK is being used
np.random.seed(seed = 0)
n_rows, n_cols = 1e5, 1e3
#initialize matrix and vector
A = np.random.rand(n_rows, n_cols)
u = np.random.rand(n_cols)
u = np.require(u, dtype=A.dtype, requirements = ['C'])
#time
start_time = time.time()
A.dot(u)
print "time with %d non-zero entries: %1.5f seconds" % (sum(u==0.0), (time.time() - start_time))
#set all but one entry of u to zero
v = u
set_to_zero = np.random.choice(np.array(range(0, u.shape[0])), size = (u.shape[0]-2), replace=False)
v[set_to_zero] = 0.0
start_time = time.time()
A.dot(v)
print "time with %d non-zero entries: %1.5f seconds" % (sum(v==0.0), (time.time() - start_time))
#what I would really expect it to take
non_zero_index = np.squeeze(v != 0.0)
A_effective = A[::,non_zero_index]
v_effective = v[non_zero_index]
start_time = time.time()
A_effective.dot(v_effective)
print "expected time with %d non-zero entries: %1.5f seconds" % (sum(v==0.0), (time.time() - start_time))
Running this, I get that the runtime for matrix-vector multiplication is the same whether I use a dense matrix u
or a sparse one v
:
time with 0 non-zero entries: 0.04279 seconds
time with 999 non-zero entries: 0.04050 seconds
expected time with 999 non-zero entries: 0.00466 seconds
I am wondering if this is by design? Or am I missing something in the way that I'm running matrix-vector multiplication. Just as sanity checks: I've made sure that numpy
is linked to a BLAS library on my machine and both arrays are C_CONTIGUOUS
(since this is apparently required for numpy to call BLAS).