2

I am looking for the first column containing a nonzero element in a sparse matrix (scipy.sparse.csc_matrix). Actually, the first column starting with the i-th one to contain a nonzero element.

This is part of a certain type of linear equation solver. For dense matrices I had the following: (Relevant line is pcol = ...)

import numpy

D = numpy.matrix([[1,0,0],[2,0,0],[3,0,1]])
i = 1

pcol = i + numpy.argmax(numpy.any(D[:,i:], axis=0))
if pcol != i:
    # Pivot columns i, pcol
    D[:,[i,pcol]] = D[:,[pcol,i]]

print(D)
# Result should be numpy.matrix([[1,0,0],[2,0,0],[3,1,0]])

The above should swap columns 1 and 2. If we set i = 0 instead, D is unchanged since column 0 already contains nonzero entries.

What is an efficient way to do this for scipy.sparse matrices? Are there analogues for the numpy.any() and numpy.argmax() functions?

1k5
  • 342
  • 6
  • 15
  • An explanation for the downvote would be nice... – 1k5 Jun 15 '15 at 10:15
  • I haven't voted on this question. But you can improve it and make it easier to answer by including an [MVCE](http://stackoverflow.com/help/mcve). See [this recent question](http://stackoverflow.com/q/29629821/553404) for an example or [this older question of mine](http://stackoverflow.com/q/22072943/553404). It's also quite difficult to understand what you are asking for in your second sentence, so maybe you could also make this clearer. – YXD Jun 15 '15 at 12:54
  • @YXD Thanks, I tried to make it more clear. The example code can be copy-pasted into a python interpreter. – 1k5 Jun 15 '15 at 13:19
  • Look at the matrix `indptr` attribute. For `csc` it points to the start of successive columns. The 1st value greater the 0 will the 1st column with a nonzero value. – hpaulj Jun 15 '15 at 14:00

1 Answers1

5

With a csc matrix it is easy to find the nonzero columns.

In [302]: arr=sparse.csc_matrix([[0,0,1,2],[0,0,0,2]])

In [303]: arr.A
Out[303]: 
array([[0, 0, 1, 2],
       [0, 0, 0, 2]])

In [304]: arr.indptr
Out[304]: array([0, 0, 0, 1, 3])

In [305]: np.diff(arr.indptr)
Out[305]: array([0, 0, 1, 2])

The last line shows how many nonzero terms there are in each column.

np.nonzero(np.diff(arr.indptr))[0][0] would be the index of the first nonzero value in that diff.

Do the same on a csr matrix for find the 1st nonzero row.

I can elaborate on indptr if you want.

hpaulj
  • 221,503
  • 14
  • 230
  • 353