Csr matrix: How to replace missing value with np.nan instead of 0?

Question

It seems that csr_matrix fill missing value with 0 in default. So how to fill the missing value with np.nan?

from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([0, 2, 3, 4, 5, 6])
csr_matrix((data, (row, col)), shape=(3, 3)).toarray()

Output:

array([[0, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

Expected:

array([[0, np.nan, 2],
       [np.nan, np.nan, 3],
       [4, 5, 6]])

What do you mean by 'missing values'? The `scipy` sparse matrix format(s) stores non-zero values. The rest are 0's. Period, full stop! I doubt if you'll find "missing" in any of the scipy.sparse documentation. — hpaulj, Aug 13 '20 at 07:06
The scipy sparse class, especially the `csr` format, is designed for math, especially linear algebra and matrix multiplication. A `nan` fill would behave in a very different way. — hpaulj, Aug 13 '20 at 12:52

score 2 · Answer 1 · answered Mar 20 '21 at 01:59

Here is a workaround:

from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([0, 2, 3, 4, 5, 6])

mask = csr_matrix(([1]*len(data), (row, col)), shape=(3, 3)).toarray()
mask[mask==0] = np.nan

csr_matrix((data, (row, col)), shape=(3, 3)).toarray() * mask

ev-br · Answer 2 · 2020-08-13T07:58:28.717

0

It's not possible with csr_matrix, which by definition stores nonzero elements.

If you really need those nans, just manipulate the dense result.

a=csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
a[a == 0] = np.nan

edited Aug 13 '20 at 07:58

answered Aug 13 '20 at 06:06

ev-br

24,968
9
65
78

1

Emm, It's not my expectation. My original data contains `0`, but I don't want to change it to `np.nan`. I only want to fill missing value with `np.nan`. BTW, I cannot run your code successfully as an error occurs like `ValueError: cannot convert float NaN to integer`. – rosefun Aug 13 '20 at 06:21
Well, then csr_matrix cannot meet your expectations :-). W.r.t. the error, it's just what is says: your matrix contains integers, and there is no integer nan. If you want nans, specify `dtype=float`. – ev-br Aug 13 '20 at 08:00
This will convert all 0 values to NaN, this is not a correct solution as the OP wants a result with NaN values and 0 values. The solution needs to retain both zeros and NaNs in the solution. – JStrahl Feb 03 '22 at 11:39

sjfleming · Answer 3 · 2022-08-12T19:39:37.127

0

def todense_fill(coo: sp.coo_matrix, fill_value: float) -> np.ndarray:
    """Densify a sparse COO matrix. Same as coo_matrix.todense()
    except it fills missing entries with fill_value instead of 0.
    """
    dummy_value = np.nan if not np.isnan(fill_value) else np.inf
    dummy_check = np.isnan if np.isnan(dummy_value) else np.isinf
    coo = coo.copy().astype(float)
    coo.data[coo.data == 0] = dummy_value
    out = np.array(coo.todense()).squeeze()
    out[out == 0] = fill_value
    out[dummy_check(out)] = 0
    return out

edited Aug 12 '22 at 19:39

answered Aug 12 '22 at 19:38

sjfleming

3
3

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – S.B Aug 19 '22 at 14:43

Csr matrix: How to replace missing value with np.nan instead of 0?

3 Answers3