Updating a coo matrix on the fly with scipy

Question

I have not found a solution to this problem after searching the site. Its quite simple, I would like to update an already existing coo sparse matrix. So lets say I have initiated a coo matrix:

from scipy.sparse import coo_matrix
import numpy as np
row  = np.array([0, 3, 1, 0])
col  = np.array([0, 3, 1, 2])
data = np.array([4, 5, 7, 9])
a=coo_matrix((data, (row, col)), shape=(4, 4)).toarray()
array([[4, 0, 9, 0],
      [0, 7, 0, 0],
      [0, 0, 0, 0],
      [0, 0, 0, 5]])

Fine but what if I just want an empty sparse array and initiate it with only the shape, and then update the values many times. The only way I have succeeded is to add a new coo matrix to my old one

a=coo_matrix((4, 4), dtype=np.int8)
a=a+coo_matrix((data, (row, col)), shape=(4, 4))
a.toarray()
array([[4, 0, 9, 0],
      [0, 7, 0, 0],
      [0, 0, 0, 0],
      [0, 0, 0, 5]])

And I would like to update this sparse array many times. But this takes quite awhile since I am calling upon the coo function for each update. There has to be a better way but I feel like the documentation is a little light (at least what I have read) or that I am just not seeing it.

Thanks very much

score 5 · Accepted Answer · edited May 23 '17 at 12:07

when you make a coo matrix this way, it uses your input arrays as the attributes of the matrix (provided they are the correct type):

In [923]: row  = np.array([0, 3, 1, 0])
     ...: col  = np.array([0, 3, 1, 2])
     ...: data = np.array([4, 5, 7, 9])
     ...: A=sparse.coo_matrix((data, (row, col)), shape=(4, 4))
In [924]: A
Out[924]: 
<4x4 sparse matrix of type '<class 'numpy.int32'>'
    with 4 stored elements in COOrdinate format>
In [925]: A.row
Out[925]: array([0, 3, 1, 0])
In [926]: id(A.row)
Out[926]: 3071951160
In [927]: id(row)
Out[927]: 3071951160

Similarly for A.col, and A.data.

For display and calculations the matrix will probably be converted to csr format, since many of those operations are not defined for a coo format.

And as you've no doubt seen coo format does not implement indexing, either for fetching or setting.

lil format is designed for easier incremental changes. Indexed changes to csr are also possible but it will issue a warning.

But coo is often used for building new matrices. For example in the bmat format, the coo attributes of the component matrices are combined into new arrays, which are then used to construct a new coo matrix.

A good way of building a coo incrementally is to keep concatenating new values to your row, col, and data arrays, and then periodically build a new coo from those.

On updating a dok format: How to incrementally create an sparse matrix on python?

putting column into empty sparse matrix

creating a scipy.lil_matrix using a python generator efficiently

I ended up taking your advice on building the row,col,data arrays and building the coo matrices at the ed. Nice work around to the problem and really its probably the most sensible way to do it. Thanks. — Canuck, Nov 28 '16 at 21:04

score 3 · Answer 2 · answered Nov 27 '16 at 22:18

3

I first thought that the coo_matrix is immutable, because it doesn't support any indexing, nor indexed assignment. Turns out you can directly mutate the underlying structure of your empty sparse matrix:

from scipy.sparse import coo_matrix
import numpy as np
row  = np.array([0, 3, 1, 0])
col  = np.array([0, 3, 1, 2])
data = np.array([4, 5, 7, 9])

a = coo_matrix((4, 4), dtype=np.int8)
print(a.toarray())

a.row = row
a.col = col
a.data = data
print(a.toarray())

That being said, there might be other sparse formats that are more suitable for this approach.

answered Nov 27 '16 at 22:18

Andras Deak -- Слава Україні

33,737
11
83
111

1

The process you have outlined is fairly fast (ns). – Canuck Nov 27 '16 at 22:26
@Canuck thanks for the feedback. Note that as hpaulj hints at in their answer: you can also mutate `row`, `col` and `data` on the fly, without rebinding `a.row` etc. each time. Though I think this only helps if it's easier to mutate the underlying arrays (i.e. if you're not already recreating them later). – Andras Deak -- Слава Україні Nov 27 '16 at 22:29
1

I'm a little more comfortable doing this direct set with the `lil` format than the `coo`. But we can study the `__init__` for the `coo_matrix` class to see if there are any pitfalls to watch out for (getting dtypes right, correct shape, updating nnz etc). In other words, what does it do besides `self.data=...`. – hpaulj Nov 28 '16 at 21:34
@hpaulj no, I agree, using less hacks and a more suitable format should be the way to go. With your answer having higher score and an accept, mine probably won't mislead anyone. Or is your point that I should add a disclaimer or delete it altogether? To be honest I'm not closely familiar with these sparse classes, so I can't really assess how bad/dangerous my answer is. – Andras Deak -- Слава Україні Nov 28 '16 at 21:51

Updating a coo matrix on the fly with scipy

2 Answers2