1

I want to compute the partition (or argpartition, but I guess the problem is quite similar) of a sparse two-dimensional matrix in an efficient way, in order to get its top-k items with low complexity.

Here is what I would do in the case of a dense matrix to get its argpartition line by line:

import numpy as np

n_best = 1
matrix = np.array([[0, 1, 2], [4, 0, 3], [0, 0, 0]])

np.argpartition(matrix, -n_best)[:, -n_best:]

which would give

array([[2],
       [0],
       [2]], dtype=int64)

Now that I have my sparse matrix, I do not want to convert it into a dense array because of its memory size, which would be too high to handle.

The only way I found out to easily compute its argpartition was with a for loop line by line:

from scipy.sparse.csr import csr_matrix

sparse_matrix = csr_matrix(matrix)

sparse_argpartition = []
for row in range(sparse_matrix.shape[0]):
    row_partition = sparse_matrix.getrow(row).indices[
        sparse_matrix.getrow(row).data.argpartition(
            -min(sparse_matrix.getrow(row).getnnz(), n_best)
        )[-n_best:]
    ]
    sparse_argpartition.append(
        np.pad(row_partition, (0, n_best - len(row_partition)))
    )

sparse_argpartition = np.array(sparse_argpartition)

I wonder if there is some parallelized way to efficiently perform this operation?

  • https://pypi.org/project/sparse-dot-topn/ this has worked well for me in the past. – CJR Jul 12 '21 at 15:28
  • close duplicate: https://stackoverflow.com/questions/49207275/finding-the-top-n-values-in-a-row-of-a-scipy-sparse-matrix – hpaulj Jul 12 '21 at 15:55
  • @hpaulj this does not really answer my question, as I already have a way to perform this operation within a `for` loop. I wanted to know if there was a parallelized way to do it, which does not seem to be the case with the solution you sent. – MarinPitavy Jul 13 '21 at 11:34

0 Answers0