Is there any fast way to find identical rows of two sparse matrices with different sizes?

Question

Consider A, an n by j matrix, and B, an m by j matrix, both in SciPy with m<n. Is there any way that I can find the indices of the rows of A which are identical to rows of B?

I have tried for loops and tried to convert them into Numpy arrays. In my case, they're not working because I'm dealing with huge matrices. Here is the link to the same question for Numpy arrays.

Edit:

An Example for A, B, and the desired output:

>>> import numpy as np
>>> from scipy.sparse import csc_matrix

>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 3, 3, 4, 5, 6])

>>> A = csc_matrix((data, (row, col)), shape=(5, 3))
>>> A.toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [3, 3, 6],
       [0, 0, 0],
       [0, 0, 0]])

>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> B = csc_matrix((data, (row, col)), shape=(4, 3))
>>> B.toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [2, 3, 6], 
       [0, 0, 0]])

Desired output:

def some_function(A,B): 
  # Some operations
  return indices
>>> some_function(A,B)
[0, 1, 3, 4]

No. `scipy.sparse` matrices are good for linear algebra kinds of things, like matrix multiplication. In fact they implement indexing with that kind of multiplication. They don't `broadcast` as in your link, and row by row iteration is slow. The best you can do is work with the `indptr` of the csr format directly. — hpaulj, Jan 30 '23 at 17:29
Maybe you could `','.join` each row into a string and hash it, keeping a dict that maps hash to row indexes. Collect the hashes of the two matrices into two sets and perform set intersection, then with the resulting set and the dict find out the indices of the identical rows. — Fractalism, Jan 30 '23 at 17:35
Don't convert to a string. Ensure that the sparse classes are the same, form sets of tuples derived from the structure members, and perform set intersection. But none of this is reproducible with no data and no code. — Reinderien, Jan 30 '23 at 19:17
Thank you for your comments. I've edited my question. Now, it has data and desired output. — Hamid, Jan 31 '23 at 00:47

Is there any fast way to find identical rows of two sparse matrices with different sizes?

0 Answers0