Cope with different slicing-behaviour in scipy.sparse and numpy

Question

Setup

I'm aware of the fact that sparse matrices in scipy's .sparse-module differ from numpy-arrays. Also, I'm aware of questions like here regarding slicing of sparse arrays. Anyhow, this and most other questions deal with the performance of slicing.

My question rather deals with how to cope with their different slicing-behaviour. Lets create an example:

import numpy as np
from scipy import sparse

matrix = np.asarray([[0,0,0,1], [1,1,0,0], [1,0,1,0], [1,0,0,1], [1,0,0,1], [1,0,0,1]])
sparse_matrix = sparse.lil_matrix(matrix) # Or another format like .csr_matrix etc.

Given this setup, applying the same slice results in a different output:

matrix[:, 3]
# Output: 
# array([ True, False, False,  True,  True,  True], dtype=bool)

sparse_matrix[:, 3]
# Output:
# matrix([[ True],
#        [False],
#        [False],
#        [ True],
#        [ True],
#        [ True]], dtype=bool)

Question

This is a bit of a bummer, since I need the first output to apply in the second case as well. As said in the beginning, I know that using sparse_matrix.A etc. will give me the desired result. Anyhow, converting the sparse matrix to an array contradicts with the initial use-case of sparse-matrices.

So is there some possibility to achieve the same slice-result without converting sparse-matrix to an array?

Edit: For clarification, since my question might be confusing regarding this: The slice on the sparse_matrix shall have the same output as matrix, meaning that something like sparse_matrix[:, 3] shall output ([ True, False, False, True, True, True]).

You can use `matrix[:, 3:4]` to get an output like that of `sparse_matrix[:, 3]`. — Warren Weckesser, Jul 22 '19 at 14:22
Sparse matrix is like `np.matrix`. Indexing like this returns a matrix which is always 2d. — hpaulj, Jul 22 '19 at 14:23
@WarrenWeckesser: I updated my question to clarify this (sorry, it was confusing): The output of `sparse_matrix` shall have the same as `matrix`, not the other way around. — Markus, Jul 22 '19 at 14:27
*"The slice on the `sparse_matrix` shall have the same output as `matrix`"*. The problem is that `matrix` is a 2-d numpy array, so `matrix[:, 3]` is a *one-dimensional* array. `sparse_matrix` is a sparse matrix object, and slicing a sparse matrix always returns another sparse matrix, and these objects are *always* two-dimensional. You would have to do something like `sparse_matrix[:, 3].A.ravel()`. — Warren Weckesser, Jul 22 '19 at 14:30

hpaulj · Accepted Answer · 2019-07-22T16:20:04.813

In [150]: arr = np.asarray([[0,0,0,1], [1,1,0,0], [1,0,1,0], [1,0,0,1], [1,0,0,1], [1,0,0,1]]) 
     ...: M = sparse.lil_matrix(arr) # Or another format like .csr_matrix etc.

A scalar index on a ndarray reduces the dimensions by one:

In [151]: arr[:,3]                                                                                           
Out[151]: array([1, 0, 0, 1, 1, 1])

It does not change the number of dimensions of the sparse matrix.

In [152]: M[:,3]                                                                                             
Out[152]: 
<6x1 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in LInked List format>

This behavior is similar to that of np.matrix subclass (and MATLAB). A sparse matrix is always 2d.

The dense array display of this matrix:

In [153]: M[:,3].A                                                                                           
Out[153]: 
array([[1],
       [0],
       [0],
       [1],
       [1],
       [1]], dtype=int64)

and the np.matrix display:

In [154]: M[:,3].todense()                                                                                   
Out[154]: 
matrix([[1],
        [0],
        [0],
        [1],
        [1],
        [1]], dtype=int64)

np.matrix has a A1 property which produces a 1d array (it converts to ndarray and applies ravel):

In [155]: M[:,3].todense().A1                                                                                
Out[155]: array([1, 0, 0, 1, 1, 1], dtype=int64)

ravel, squeeze and scalar indexing are all ways of reducing the dimensions of a ndarray. But they don't work directly on a np.matrix or sparse matrix.

Another example of a 2d sparse matrix:

In [156]: sparse.lil_matrix(arr[:,3])                                                                        
Out[156]: 
<1x6 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in LInked List format>
In [157]: _.A                                                                                                
Out[157]: array([[1, 0, 0, 1, 1, 1]], dtype=int64)

Note the [[...]]. sparse has added a leading size 1 dimension to the 1d ndarray.

Thank you for clarification. A bit sad though that it's not directly possible to slice the sparse matrix as intended. — Markus, Jul 22 '19 at 20:24
You might want to explore how the matrices are stored. You `lil` is stored by row. Selecting a row is fairly straight forward; there's even a 'view' like method. But selecting a column means finding values across rows. Similarly `csr` is optimized for row access, while its transpose gives columns more directly. — hpaulj, Jul 22 '19 at 20:33

Cope with different slicing-behaviour in scipy.sparse and numpy

1 Answers1