Fast column access over large scipy sparse matrix

Question

I am working with scipy's csc sparse matrix and currently a major bottleneck in the code is a line similar to the following

for i in range(multiply_cols.shape[0]):
   F = F - factor*values[i]*mat.getcol(multiply_cols[i])

The matrices that I am working with are extremely large, of size typically more than 10**6x10**6 and I don't want to convert them to dense matrix. In fact I have a restriction to always have the matrix in csc format. My attempts show that converting to coo_matrix or lil_matrix also does not pay off.

Here is my rudimentary attempts using csc, csr and coo:

n=1000
sA = csc_matrix(np.random.rand(n,n))
F = np.random.rand(n,1)
multiply_cols = np.unique(np.random.randint(0,int(0.6*n),size=n))
values = np.random.rand(multiply_cols.shape[0])

def foo1(mat,F,values,multiply_cols):
    factor = 0.75
    for i in range(multiply_cols.shape[0]):
        F = F - factor*values[i]*mat.getcol(multiply_cols[i])

def foo2(mat,F,values,multiply_cols):
    factor = 0.75
    mat = mat.tocsr()
    for i in range(multiply_cols.shape[0]):
        F = F - factor*values[i]*mat.getcol(multiply_cols[i])

def foo3(mat,F,values,multiply_cols):
    factor = 0.75
    mat = mat.tocoo()
    for i in range(multiply_cols.shape[0]):
        F = F - factor*values[i]*mat.getcol(multiply_cols[i])

def foo4(mat,F,values,multiply_cols):
    factor = 0.75
    mat = mat.tolil()
    for i in range(multiply_cols.shape[0]):
        F = F - factor*values[i]*mat.getcol(multiply_cols[i])

and timing them I get:

In [41]: %timeit foo1(sA,F,values,multiply_cols)
10 loops, best of 3: 133 ms per loop

In [42]: %timeit foo2(sA,F,values,multiply_cols)
1 loop, best of 3: 999 ms per loop

In [43]: %timeit foo3(sA,F,values,multiply_cols)
1 loop, best of 3: 6.38 s per loop

In [44]: %timeit foo4(sA,F,values,multiply_cols)
1 loop, best of 3: 45.1 s per loop

So certainly coo_matrix and lil_matrix are not a good choice here. Does anyone know a faster way of doing this. Is it a good option to retrieve the underlyng indptr, indices and data have a custom cython solution?

http://stackoverflow.com/a/40984887/901925 - discussion of row access with `csr`. And http://stackoverflow.com/a/39500986/901925 — hpaulj, Dec 06 '16 at 14:58
@TasamFarkie I don't know how good your benchmark is. Its seems likely that an efficient implementation involves not doing work on empty cells, but your test matrix is entirely filled — DavidW, Dec 06 '16 at 15:06
So, is `sA` the input sparse matix? If so, I don't see it being used later on in the code. — Divakar, Dec 06 '16 at 15:55

Divakar · Answer 1 · 2016-12-06T18:40:08.937

Well you could use a vectorized approach that uses matrix-multiplication of sliced out columns from sparse matrix against values, like so -

F -= (mat[:,multiply_cols]*values*factor)[:,None]

Benchmarking

It seems foo1 is the fastest of the lot listed in the question. So, let's time the proposed approach against that one.

Function definitions -

def foo1(mat,F,values,multiply_cols):
    factor = 0.75
    outF = F.copy()
    for i in range(multiply_cols.shape[0]):
        outF -= factor*values[i]*mat.getcol(multiply_cols[i])
    return outF

def foo_vectorized(mat,F,values,multiply_cols):
    factor = 0.75 
    return F - (mat[:,multiply_cols]*values*factor)[:,None]

Timings and verification on bigger set with sparseness -

In [242]: # Setup inputs
     ...: n = 3000
     ...: mat = csc_matrix(np.random.randint(0,3,(n,n))) #Sparseness with  0s
     ...: F = np.random.rand(n,1)
     ...: multiply_cols = np.unique(np.random.randint(0,int(0.6*n),size=n))
     ...: values = np.random.rand(multiply_cols.shape[0])
     ...: 

In [243]: out1 = foo1(mat,F,values,multiply_cols)

In [244]: out2 = foo_vectorized(mat,F,values,multiply_cols)

In [245]: np.allclose(out1, out2)
Out[245]: True

In [246]: %timeit foo1(mat,F,values,multiply_cols)
1 loops, best of 3: 641 ms per loop

In [247]: %timeit foo_vectorized(mat,F,values,multiply_cols)
10 loops, best of 3: 40.3 ms per loop

In [248]: 641/40.3
Out[248]: 15.905707196029779

There we have a 15x+ speedup!

score 1 · Accepted Answer · edited May 23 '17 at 12:33

I found in

Sparse matrix slicing using list of int

that column (or row) indexing for sparse matrices is essentially a matrix multiplication task - construct a sparse matrix with the right mix of 1s and 0s, and multiply. Also row (and column) sums are done with multiplication.

This function implements that idea. M is a 1 column sparse matrix, with values in the multiply_cols slots:

def wghtsum(sA, values, multiply_cols):
   cols = np.zeros_like(multiply_cols)
   M=sparse.csc_matrix((values,(multiply_cols,cols)),shape=(sA.shape[1],1))
   return (sA*M).A

testing:

In [794]: F1=wghtsum(sA,values,multiply_cols)
In [800]: F2=(sA[:,multiply_cols]*values)[:,None]  # Divaker's
In [802]: np.allclose(F1,F2)
Out[802]: True

It has a modest time savings over @Divakar's solution:

In [803]: timeit F2=(sA[:,multiply_cols]*values)[:,None]
100 loops, best of 3: 18.3 ms per loop
In [804]: timeit F1=wghtsum(sA,values,multiply_cols)
100 loops, best of 3: 6.57 ms per loop

=======

sA as created is dense - it's a sparse rendition of a dense random array. sparse.rand can be used to create a sparse random matrix with a defined level of sparsity.

In testing your foo1 I had a problem with getcol:

In [818]: sA.getcol(multiply_cols[0])
...
TypeError: an integer is required
In [819]: sA.getcol(multiply_cols[0].item())
Out[819]: 
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
    with 1000 stored elements in Compressed Sparse Column format>
In [822]: sA[:,multiply_cols[0]]
Out[822]: 
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
    with 1000 stored elements in Compressed Sparse Column format>

I suspect that's caused by a scipy version difference.

In [821]: scipy.__version__
Out[821]: '0.17.0'

This issue did go away in 0.18; but I can't find a relevant issue/pullrequest.

This is very informative. Thanks mate! For the actual sparse matrices that I have, this approach performs considerably faster than Divakar's solution. However I have that noticed for certain sparsity patterns this approach gives erroneous results. I'm trying to investigate this and perhaps come up with MWE. — Tasam Farkie, Dec 20 '16 at 02:40

Fast column access over large scipy sparse matrix

2 Answers2