How can I efficiently find the maximum element and its indices in a scipy csr_matrix?

Question

I have a sparse undirected graph stored in a scipy csr_matrix, and I need to find out the edge with maximum weight, which means that I need to find the maximum value and its corresponding row and column indices (actually I have to find the K largest values but to simplify the problem). Hence I wrote:

M=M.toarray()
for i in range(1,len(M)):
    for j in range(i+1,len(M[i])):
        if M[i][j] > maximum:
            row,col,maximum = i,j,M[i][j]

It seems to be clumsy and performs poorly. Is there any better way to do this?

[Finding maximum value and their indices in a sparse lil_matrix (Scipy/Python)](http://stackoverflow.com/questions/24841271/finding-maximum-value-and-their-indices-in-a-sparse-lil-matrix-scipy-python) this should work. Or you can get the inner np array using `.toarray()` and use `np.unravel_index(a.argmax(), a.shape)` — umutto, Mar 31 '17 at 01:37
Also http://stackoverflow.com/questions/31790819/scipy-sparse-csr-matrix-how-to-get-top-ten-values-and-indices (top 10 per row). — hpaulj, Mar 31 '17 at 01:44

score 1 · Answer 1 · edited May 23 '17 at 12:10

If you want to find the maximum alone, M.max() is enough:

>>> m = scipy.sparse.rand(1000, 1000, format='csr')
>>> type(m)
<class 'scipy.sparse.csr.csr_matrix'>
>>> m.max()
0.99991127228906729

If you want find the index as well, besides converting to a coo_matrix, you can to operate on the .data, .indices and .indptr directly. The relationship between these members is mentioned in the documentation,

csr_matrix((data, indices, indptr), [shape=(M, N)])

is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]].

So,

>>> m.sort_indices()
>>> numpy.argmax(m.data)
1171
>>> index = _
>>> m.indices[index]
483
>>> col = _
>>> numpy.searchsorted(m.indptr, index, side='right') - 1
116
>>> row = _
>>> m[row, col]
0.99991127228906729

I've tried working on the `.indices` and `.indptr`, only got confused. I'm not fond of the way to manipulate on the class members directly. Thanks. — ddhlfb05PT8q, Mar 31 '17 at 02:10

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

0

import numpy as np
np.array(np.unravel_index(np.argsort(M.flatten(), axis=0)[-K:], M.shape)).T[::-1]

This returns an array of indices of the K largest values, in descending order. Example:

M=np.ones((10,10))
M[2,6]=10
M[4,3]=12
np.array(np.unravel_index(np.argsort(M.flatten(), axis=0)[-2:], M.shape)).T[::-1]

Output: [[4,3], [2,6]]

Edit:

(Assuming you have a square matrix). If you don't want any diagonal elements returned you could mask the array:

mask = (1-np.identity(M.shape[0]))
np.array(np.unravel_index(np.argsort((M*mask).flatten(), axis=0)[-2:], M.shape)).T[::-1]

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 31 '17 at 01:40

Robbie

4,672
1
19
24

For sparse `M`, you can't use `M.flatten()`. But `M.A.flatten()` will work. `np.argmax(M.A)` gives the raveled index of the the maximum. In any case the use of `unravel_index` is a good tool if you want the row/col from a dense array. – hpaulj Mar 31 '17 at 01:58
Yes thats correct @hpaulj - or M.toarray() as the OP had :-) – Robbie Mar 31 '17 at 02:00
Your solution works well, but here comes another problem: I don't need the elements on the diagonal like M[3,3],M[4,4]. In the codes I post I use a for-loop to get rid of them. How to do this with your solution? – ddhlfb05PT8q Mar 31 '17 at 02:39

How can I efficiently find the maximum element and its indices in a scipy csr_matrix?

2 Answers2

csr_matrix((data, indices, indptr), [shape=(M, N)])

Edit: