For a problem like this is pays to understand the underlying data structures for the different formats:
In [672]: A=sparse.csr_matrix(np.arange(24).reshape(4,6))
In [673]: A.data
Out[673]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23], dtype=int32)
In [674]: A.indices
Out[674]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)
In [675]: A.indptr
Out[675]: array([ 0, 5, 11, 17, 23], dtype=int32)
The data
values for a row are a slice within A.data
, but identifying that slice requires some knowledge of the A.indptr
(see below)
For the coo
.
In [676]: Ac=A.tocoo()
In [677]: Ac.data
Out[677]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23], dtype=int32)
In [678]: Ac.row
Out[678]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], dtype=int32)
In [679]: Ac.col
Out[679]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)
Note that A.nonzeros()
converts to coo
and returns the row
and col
attributes (more or less - look at its code).
For the lil
format; data is stored by row in lists:
In [680]: Al=A.tolil()
In [681]: Al.data
Out[681]:
array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]], dtype=object)
In [682]: Al.rows
Out[682]:
array([[1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]], dtype=object)
===============
Selecting a row of A
works, though in my experience that tends to be a bit slow, in part because it has to create a new csr
matrix. Also your expression seems wordier than needed.
Looking at my first row which has a 0 element (the others are too dense):
In [691]: A[0, A[0,:].nonzero()[1]].A
Out[691]: array([[1, 2, 3, 4, 5]], dtype=int32)
The whole row, expressed as a dense array is:
In [692]: A[0,:].A
Out[692]: array([[0, 1, 2, 3, 4, 5]], dtype=int32)
but the data
attribute of that row is the same as your selection
In [693]: A[0,:].data
Out[693]: array([1, 2, 3, 4, 5], dtype=int32)
and with the lil
format
In [694]: Al.data[0]
Out[694]: [1, 2, 3, 4, 5]
A[0,:].tocoo()
doesn't add anything.
Direct access to attributes of a csr
and lil
isn't that good when picking columns. For that csc
is better, or lil
of the transpose.
Direct access to the csr
data
, with the aid of indptr
, would be:
In [697]: i=0; A.data[A.indptr[i]:A.indptr[i+1]]
Out[697]: array([1, 2, 3, 4, 5], dtype=int32)
Calculations using the csr
format routinely iterate through indptr
like this, getting the values of each row - but they do this in compiled code.
A recent related topic, seeking the product of nonzero elements by row:
Multiplying column elements of sparse Matrix
There I found the reduceat
using indptr
was quite fast.
Another tool when dealing with sparse matrices is multiplication
In [708]: (sparse.csr_matrix(np.array([1,0,0,0])[None,:])*A)
Out[708]:
<1x6 sparse matrix of type '<class 'numpy.int32'>'
with 5 stored elements in Compressed Sparse Row format>
csr
actually does sum
with this kind of multiplication. And if my memory is correct, it actually performs A[0,:]
this way
Sparse matrix slicing using list of int