5

What is the fastest or, failing that, least wordy way of accessing all non-zero values in a row row or column col of a scipy.sparse matrix A in CSR format?

Would doing it in another format (say, COO) be more efficient?

Right now, I use the following:

A[row, A[row, :].nonzero()[1]]

or

A[A[:, col].nonzero()[0], col]
musically_ut
  • 34,028
  • 8
  • 94
  • 106

1 Answers1

11

For a problem like this is pays to understand the underlying data structures for the different formats:

In [672]: A=sparse.csr_matrix(np.arange(24).reshape(4,6))
In [673]: A.data
Out[673]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [674]: A.indices
Out[674]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)
In [675]: A.indptr
Out[675]: array([ 0,  5, 11, 17, 23], dtype=int32)

The data values for a row are a slice within A.data, but identifying that slice requires some knowledge of the A.indptr (see below)

For the coo.

In [676]: Ac=A.tocoo()
In [677]: Ac.data
Out[677]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [678]: Ac.row
Out[678]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], dtype=int32)
In [679]: Ac.col
Out[679]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)

Note that A.nonzeros() converts to coo and returns the row and col attributes (more or less - look at its code).

For the lil format; data is stored by row in lists:

In [680]: Al=A.tolil()
In [681]: Al.data
Out[681]: 
array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]], dtype=object)
In [682]: Al.rows
Out[682]: 
array([[1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]], dtype=object)

===============

Selecting a row of A works, though in my experience that tends to be a bit slow, in part because it has to create a new csr matrix. Also your expression seems wordier than needed.

Looking at my first row which has a 0 element (the others are too dense):

In [691]: A[0, A[0,:].nonzero()[1]].A
Out[691]: array([[1, 2, 3, 4, 5]], dtype=int32)

The whole row, expressed as a dense array is:

In [692]: A[0,:].A
Out[692]: array([[0, 1, 2, 3, 4, 5]], dtype=int32)

but the data attribute of that row is the same as your selection

In [693]: A[0,:].data
Out[693]: array([1, 2, 3, 4, 5], dtype=int32)

and with the lil format

In [694]: Al.data[0]
Out[694]: [1, 2, 3, 4, 5]

A[0,:].tocoo() doesn't add anything.

Direct access to attributes of a csr and lil isn't that good when picking columns. For that csc is better, or lil of the transpose.

Direct access to the csr data, with the aid of indptr, would be:

In [697]: i=0; A.data[A.indptr[i]:A.indptr[i+1]]
Out[697]: array([1, 2, 3, 4, 5], dtype=int32)

Calculations using the csr format routinely iterate through indptr like this, getting the values of each row - but they do this in compiled code.

A recent related topic, seeking the product of nonzero elements by row: Multiplying column elements of sparse Matrix

There I found the reduceat using indptr was quite fast.

Another tool when dealing with sparse matrices is multiplication

In [708]: (sparse.csr_matrix(np.array([1,0,0,0])[None,:])*A)
Out[708]: 
<1x6 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>

csr actually does sum with this kind of multiplication. And if my memory is correct, it actually performs A[0,:] this way

Sparse matrix slicing using list of int

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I did not know of the `.A` attribute (which simplifies my `np.asarray` and `.toarray()`, `.todense()` mess at other parts of my code). The `.data` attribute indeed does give me what I needed. Thanks for the tips regarding `.indptr` as well! – musically_ut Dec 05 '16 at 23:39
  • 1
    My highest voted answer is for an even more obscure property, `M.A1` which turns a `matrix` into a 1d array, http://stackoverflow.com/questions/3337301/numpy-matrix-to-array/20765358#20765358 – hpaulj Dec 06 '16 at 00:13