I have a large sparse matrix in python, and would like to perform many elementary row operations on it. That is, adding a multiple of one row to another row. What is the most efficient way of doing this? Elementary row operations are possible with a lil matrix, but very slow. csc and csr matrices do not support these operations.
-
AFAIK, each time you add a non-zero entry to an existing sparse matrix, the whole matrix needs to be copied (array elements are stored in order, and this order needs to be amended with a new entry; also the memory needed to encode the matrix changes). Addition will always be slow for sparse matrices. If the matrix is too large to load into memory, I would suggest working on a dense matrix saved on the hard disk in an indexable format such as hdf5. – Paul Brodersen Mar 20 '17 at 16:04
1 Answers
First, a MCVexample would help the question a lot. I can only speculate about your row operations.
A fundamental question - do the rows differ in their sparsity structure? Take the lil
case. If 2 rows have the same rows
lists, then your math can work with the data
lists directly. If rows
differ than math becomes much more complicated, since you have to change both lists.
Both lil
and csr
support indexing by row
In [7]: M=sparse.rand(10,10,.3)
In [8]: Mr=M.tocsr()
In [9]: Ml=M.tolil()
Yes, csr
gives a warning if you change a row by adding another:
In [17]: Ml[2,:] += Ml[1,:]
In [18]: Mr[2,:] += Mr[1,:]
...
SparseEfficiencyWarning)
But the lil
math actually uses a csr
intermediary. lil
rows are represented as lists, not arrays.
In [14]: Ml[1,:]+Ml[2,:]
Out[14]:
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
Indexed matrix operations are slow, especially compared to the dense array equivalents. But they take care of a lot of little details for you.
I've explored row operations in other SO answers. When I have a better idea of what you are trying to do, I search those.
Overall, there isn't a magic bullet, especially if you are changing sparsity. scipy sparse
isn't the best tool for fast row calculations.
scipy: Adding a sparse vector to a specific row of a sparse matrix - this one is close enough that I'm tempted to flag this question as a duplicate.
Extremely slow sum row operation in Sparse LIL matrix in Python
(more in a SO search on 'user:901925 [scipy] rows'
)
-
Unfortunately, two rows of my matrix will generally not have the same sparsity structure. Thank you for referring me to the other links. I didn't know about them. – user3433489 Mar 20 '17 at 17:43
-
For me, (the poster) this solution worked: Express the matrix as a 2-level nested dictionary where row i, column j is accessed through d[i][j]. Adding two rows is then 'adding' d[i1] to d[i2], which is discussed [here](http://stackoverflow.com/questions/11011756/is-there-any-pythonic-way-to-combine-two-dicts-adding-values-for-keys-that-appe). – user3433489 Mar 21 '17 at 13:45