Take the max along an axis of a sparse matrix after already calculating the argmax along that axis

Question

I want to take both the argmax and max along an axis of a scipy.sparse matrix X

>>> type(X)
scipy.sparse.csr.csr_matrix

>>> idx = X.argmax(axis=0)

>>> maxes = X.max(axis=0)

I don't want to have to compute the max twice, but I can't use the same approach to this as if X were a np.ndarray. How can I apply the indices from argmax to X?

I can imagine copying the underlying code for sparse `argmax` to return both the index and value. But short of that I suspect this dual evaluation will be fastest, You can't simply transfer dense array intuitions to sparse ones. — hpaulj, Oct 10 '18 at 17:23
@hpaulj that's lame to hear, I wish I had sparse matrix intuitions... I wonder if the time it will take me to dive into the source code will be less than the amount of time I'll save. — swhat, Oct 10 '18 at 17:40
Csr `argmax` uses `indptr` to iterate on the rows of the matrix, and then finds the max on that row. That's conceptually simple, except for the possibility that the row is all 0s, or that 0 itself is the min or max. It may be easier to visualize the rows of a matrix when using the `lil` format. — hpaulj, Oct 10 '18 at 18:25

Hemerson Tacon · Answer 1 · 2018-10-10T17:42:58.400

1

I managed to achieve the result that you want adapting the approach that you linked:

from scipy.sparse import csr_matrix

a = [[4, 0, 0], [0, 3, 0], [0, 0, 1]]
a = csr_matrix(a)
idx = a.argmax(axis=0)
m = a.shape[1]
a[idx,np.arange(m)[None,:]].toarray()

Outputs:

array([[4, 3, 1]], dtype=int32)

edited Oct 10 '18 at 17:42

answered Oct 10 '18 at 15:41

Hemerson Tacon

2,419
1
16
28

Your `idx` is over a different axis. In any case, this is actually a bit slower. sparse matrix indexing isn't as fast as the dense equivalent. – hpaulj Oct 10 '18 at 17:05
@hpaulj Now it's extracting the max over the required axis. I didn't test the solution with a really big matrix to see if the speed difference you mention is significant. Now I'm thinking if converting to a `np.array` and then performing the `argmax` and `max` would be faster. – Hemerson Tacon Oct 10 '18 at 17:49
1

@H.Tacon the problem with that is that the dense version is far too large to fit in memory, which is the motivation to use sparse in the first place. – swhat Oct 10 '18 at 17:51

Take the max along an axis of a sparse matrix after already calculating the argmax along that axis

1 Answers1