1

In python, what is the best way to add a CSR vector to a specific row of a CSR matrix? I found one workaround here, but wondering if there is a better/more efficient way to do this. Would appreciate any help.

Given an NxM CSR matrix A and a 1xM CSR matrix B, and a row index i, the goal is to add B to the i-th row of A efficiently.

Community
  • 1
  • 1
beygel
  • 11
  • 3
  • Please elaborate your question --SO Review – Ravi Teja Kumar Isetty Feb 14 '17 at 16:23
  • you mean in-place? – Paul Panzer Feb 14 '17 at 16:43
  • In place. Another question is adding two CSR matrices (with the same dimensions). These are basic operations but I can't seem to find the right way to do them. – beygel Feb 14 '17 at 16:48
  • Then you're probably familiar with that `SparseEfficiencyWarning` recommending `lil_matrix` instead? I just timed it a bit. Not very scientific, though. Anyway, csr seems to be faster than lil for both adding a row in-place and adding two matrices; and I'm not even counting the cost of conversion, so that's not the solution. – Paul Panzer Feb 14 '17 at 16:57
  • Do the existing row and the vector that you want to add to it share the same sparsity structure? – ali_m Feb 14 '17 at 17:12
  • How do you add a row in place with csr? – beygel Feb 14 '17 at 17:33
  • If they share the same sparsity structure it is easy. I just checked: `csr_matrices` expose their internal structure via the `data`, `indptr` and `indices` attrributes and `data` is writable. meaning you just need to in-place add: `m.data[m.indptr[j]:m.indptr[j+1] += row.data` – Paul Panzer Feb 14 '17 at 18:05
  • Paul, that would have been great but I am getting "ValueError: operands could not be broadcast together with shapes (0) (1,M)". Bow m and row are with shapes (N,M) and (1,M) respectively. – beygel Feb 14 '17 at 18:32
  • The added row is also sparse but doesn't necessarily have the same non-zeros. – beygel Feb 14 '17 at 18:37
  • (Sorry, I mislead before by saying that they have the same sparsity structure.) – beygel Feb 14 '17 at 18:43
  • Oh, sorry, I thought by same sparsity structure you meant same nonzeros. If that's not the case this doesn't work. – Paul Panzer Feb 14 '17 at 18:46

2 Answers2

1

The obvious indexed addition does work. It gives a efficiency warning, but that doesn't mean it is the slowest way, just that you shouldn't count of doing this repeatedly. It suggests working with the lil format, but conversion to that and back probably takes more time than performing the addition to the csr matrix.

In [1049]: B.A
Out[1049]: 
array([[0, 9, 0, 0, 1, 0],
       [2, 0, 5, 0, 0, 9],
       [0, 2, 0, 0, 0, 0],
       [2, 0, 0, 0, 0, 0],
       [0, 9, 5, 3, 0, 7],
       [1, 0, 0, 8, 9, 0]], dtype=int32)
In [1051]: B[1,:] += np.array([1,0,1,0,0,0])
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
In [1052]: B
Out[1052]: 
<6x6 sparse matrix of type '<class 'numpy.int32'>'
    with 17 stored elements in Compressed Sparse Row format>
In [1053]: B.A
Out[1053]: 
array([[0, 9, 0, 0, 1, 0],
       [3, 0, 6, 0, 0, 9],
       [0, 2, 0, 0, 0, 0],
       [2, 0, 0, 0, 0, 0],
       [0, 9, 5, 3, 0, 7],
       [1, 0, 0, 8, 9, 0]])

As your linked question shows, it is possible to act directly on the attributes of the sparse matrix. His code shows why there's an efficiency warning - in the general case it has to rebuild the matrix attributes.

lil is more efficient for row replacement because it just has to change a sublist in the matrix .data and .rows attributes. A change in one row doesn't change the attributes of any of the others.

That said, IF your addition has the same sparsity as the original row, it is possible change specific elements of the data attribute without reworking .indices or .indptr. Drawing on the linked code

A.data[:idx_start_row : idx_end_row]

is the slice of A.data that will be changed. You need of course the corresponding slice from the 'vector'.

Starting with the In [1049] B

In [1085]: B.indptr
Out[1085]: array([ 0,  2,  5,  6,  7, 11, 14], dtype=int32)
In [1086]: B.data
Out[1086]: array([9, 1, 2, 5, 9, 2, 2, 9, 5, 3, 7, 1, 8, 9], dtype=int32)
In [1087]: B.indptr[[1,2]]  # row 1
Out[1087]: array([2, 5], dtype=int32)
In [1088]: B.data[2:5]
Out[1088]: array([2, 5, 9], dtype=int32)
In [1089]: B.indices[2:5]   # row 1 column indices
Out[1089]: array([0, 2, 5], dtype=int32)
In [1090]: B.data[2:5] += np.array([1,2,3])
In [1091]: B.A
Out[1091]: 
array([[ 0,  9,  0,  0,  1,  0],
       [ 3,  0,  7,  0,  0, 12],
       [ 0,  2,  0,  0,  0,  0],
       [ 2,  0,  0,  0,  0,  0],
       [ 0,  9,  5,  3,  0,  7],
       [ 1,  0,  0,  8,  9,  0]], dtype=int32)

Notice where the changed values, [3,7,12], are in the lil format:

In [1092]: B.tolil().data
Out[1092]: array([[9, 1], [3, 7, 12], [2], [2], [9, 5, 3, 7], [1, 8, 9]], dtype=object)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks, I will experiment with this. The added row is also sparse but doesn't necessarily have the same non-zeros as the row it's added to. – beygel Feb 14 '17 at 18:37
  • If it doesn't have the sparsity, then you can't use my simple code. You'll have to change the `nnz`, `indices` and `indptr` like the linked answer does. It can be done, but I question whether it is worth the effort. – hpaulj Feb 14 '17 at 18:44
  • Is there a simpler solution that's still efficient? – beygel Feb 14 '17 at 18:47
  • If you have to change the sparsity of a row, efficiency is going to take a hit. Changes like this are never as simple with sparse matrices as with `numpy` arrays. – hpaulj Feb 14 '17 at 18:53
0

csr / csc matrices are efficient for most operations including addition (O(nnz)). However, little changes that affect the sparsity structure such as your example or even switching a single position from 0 to 1 are not because they require a O(nnz) reorganisation of the representation. Values and indices are packed; inserting one, all above need to move.

If you do just a single such operation, my guess would be that you can't easily beat scipy's implementation. However, if you are adding multiple rows for example it may be worthwile first making a sparse matrix of them and then adding that in one go.

Creating a csr matrix by hand from rows, say, is not that difficult. For example if your rows are dense and in order:

row_numbers, indices = np.where(rows)
data = rows[row_numbers, indices]
indptr = np.searchsorted(np.r_[true_row_numbers[row_numbers], N], np.arange(N+1))

If you have a collection of sparse rows and their row numbers:

data = np.r_[tuple([r.data for r in rows])]
indices = np.r_[tuple(r.indices for r in rows])]
jumps = np.add.accumulate([0] + [len(r) for r in rows])
indptr = np.repeat(jumps, np.diff(np.r_[-1, true_row_numbers, N]))
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99