My goal here is to build the sparse CSR matrix very fast. It is currently the main bottleneck in my process, and I've already optimized it by constructing the coo matrix relatively fast, and then using tocsr().
However, I would imagine that constructing the csr matrix directly must be faster?
I have a very specific format of a sparse matrix that is also large (i.e. on orders of 100000x50000). I've looked online at these other answers, but most are not addressing the question I have.
- Efficiently construct FEM/FVM matrix Looks at constructing a very specific formatted sparse matrix vs using coo, which led to a scipy merge improvement on the speed of tocsr().
Sparse Matrix Structure:
The sparse matrix, H, is comprised of W lists of size N, or built from an initial array of size NxW, lets call it A. Along the diagonal are repeating lists of size N for N times. So for the first N rows of H, is A[:,0] repeated, but sliding along N steps for each row.
Comparison to COO.tocsr()
When I scale up N and W, and build up the COO matrix first, then running tocsr(), it is actually faster then just building the CSR matrix directly. I'm not sure why this would be the case? I am wondering if perhaps I can take advantage of the structure of my sparse matrix H in some way? Since there are many repeating elements in there.
Code Sample
Here is a code sample to visualize what is going on for a small sample size:
from scipy.sparse import linalg, dok_matrix, coo_matrix, csr_matrix
import numpy as np
import matplotlib.pyplot as plt
def test_csr(testdata):
indices = [x for _ in range(W-1) for x in range(N**2)]
ptrs = [N*(i) for i in range(N*(W-1))]
ptrs.append(len(indices))
data = []
# loop along the first axis
for i in range(W-1):
vec = testdata[:,i].squeeze()
# repeat vector N times
for i in range(N):
data.extend(vec)
Hshape = ((N*(W-1), N**2))
H = csr_matrix((data, indices, ptrs), shape=Hshape)
return H
N = 4
W = 8
testdata = np.random.randn(N,W)
print(testdata.shape)
H = test_csr(testdata)
plt.imshow(H.toarray(), cmap='jet')
plt.show()