I am trying to speed up an algorithm. The bottleneck in the algorithm is computing "Ax", where, A is a large sparse-matrix with n X m dimension and x is a dense vector with m dimension. My algorithm tries to select specific d columns of A from m columns that d << m, we also select corresponding d elements in x. we call them sub_A and sub_x, and we only need to compute the multiplication between sub_A and sub_x.
However, I found that, this kind of multiplication in scipy shows not clear speed up effects. Even if I make d < m/100, the speed up only achieve 2 times, this is quite strange. As the second dimension of A has shrank so much. I tried the similar code in matlab and got a more clear speed up. If I make d<m/100, I can speed up the computation almost 50-100 times.
I checked it on the internet and noticed that there are some strange bottleneck in the scipy code, which cause the sparse matrix multiplication with a dense [tag:NumPy] vector is extremely slow. People suggest to use pysparse or cysparse however these modules have stopped updating several years ago.
Is there any other method in python can solve the problem? Otherwise I have to move my whole project to matlab.
I have tried the computation in both python and matlab, with a 99% sparse-matrix A and a dense x.
import scipy.sparse as sp
import numpy as np
import time
m = 10000
n = 100
d = 100
times = 100
x = np.ones((m,1))
A = sp.random(n, m, density=0.01, format='csr')
start_time = time.time()
for i in range(times):
c = A.dot(x)
end_time = time.time()
print("Ax cost:", end_time - start_time)
row_indices = np.random.choice(m, d, replace=False)
sub_x = x[row_indices]
sub_A = A[:,row_indices]
start_time = time.time()
for i in range(times):
c = sub_A.dot(sub_x)
end_time = time.time()
print("sub_A x cost:", end_time - start_time)
The output is
Ax cost: 0.002000093460083008
sub_A dot sub_x cost: 0.0010018348693847656
Even the d = m/100, the computational speed has no huge difference.