3

I want to take both the argmax and max along an axis of a scipy.sparse matrix X

>>> type(X)
scipy.sparse.csr.csr_matrix

>>> idx = X.argmax(axis=0)

>>> maxes = X.max(axis=0)

I don't want to have to compute the max twice, but I can't use the same approach to this as if X were a np.ndarray. How can I apply the indices from argmax to X?

swhat
  • 483
  • 3
  • 12
  • 1
    I can imagine copying the underlying code for sparse `argmax` to return both the index and value. But short of that I suspect this dual evaluation will be fastest, You can't simply transfer dense array intuitions to sparse ones. – hpaulj Oct 10 '18 at 17:23
  • @hpaulj that's lame to hear, I wish I had sparse matrix intuitions... I wonder if the time it will take me to dive into the source code will be less than the amount of time I'll save. – swhat Oct 10 '18 at 17:40
  • Csr `argmax` uses `indptr` to iterate on the rows of the matrix, and then finds the max on that row. That's conceptually simple, except for the possibility that the row is all 0s, or that 0 itself is the min or max. It may be easier to visualize the rows of a matrix when using the `lil` format. – hpaulj Oct 10 '18 at 18:25

1 Answers1

1

I managed to achieve the result that you want adapting the approach that you linked:

from scipy.sparse import csr_matrix

a = [[4, 0, 0], [0, 3, 0], [0, 0, 1]]
a = csr_matrix(a)
idx = a.argmax(axis=0)
m = a.shape[1]
a[idx,np.arange(m)[None,:]].toarray()

Outputs:

array([[4, 3, 1]], dtype=int32)
Hemerson Tacon
  • 2,419
  • 1
  • 16
  • 28
  • Your `idx` is over a different axis. In any case, this is actually a bit slower. sparse matrix indexing isn't as fast as the dense equivalent. – hpaulj Oct 10 '18 at 17:05
  • @hpaulj Now it's extracting the max over the required axis. I didn't test the solution with a really big matrix to see if the speed difference you mention is significant. Now I'm thinking if converting to a `np.array` and then performing the `argmax` and `max` would be faster. – Hemerson Tacon Oct 10 '18 at 17:49
  • 1
    @H.Tacon the problem with that is that the dense version is far too large to fit in memory, which is the motivation to use sparse in the first place. – swhat Oct 10 '18 at 17:51