0

Consider A, an n by j matrix, and B, an m by j matrix, both in SciPy with m<n. Is there any way that I can find the indices of the rows of A which are identical to rows of B?

I have tried for loops and tried to convert them into Numpy arrays. In my case, they're not working because I'm dealing with huge matrices. Here is the link to the same question for Numpy arrays.

Edit:

An Example for A, B, and the desired output:

>>> import numpy as np
>>> from scipy.sparse import csc_matrix

>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 3, 3, 4, 5, 6])

>>> A = csc_matrix((data, (row, col)), shape=(5, 3))
>>> A.toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [3, 3, 6],
       [0, 0, 0],
       [0, 0, 0]])

>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> B = csc_matrix((data, (row, col)), shape=(4, 3))
>>> B.toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [2, 3, 6], 
       [0, 0, 0]])

Desired output:

def some_function(A,B): 
  # Some operations
  return indices
>>> some_function(A,B)
[0, 1, 3, 4]
Hamid
  • 41
  • 4
  • No. `scipy.sparse` matrices are good for linear algebra kinds of things, like matrix multiplication. In fact they implement indexing with that kind of multiplication. They don't `broadcast` as in your link, and row by row iteration is slow. The best you can do is work with the `indptr` of the csr format directly. – hpaulj Jan 30 '23 at 17:29
  • Maybe you could `','.join` each row into a string and hash it, keeping a dict that maps hash to row indexes. Collect the hashes of the two matrices into two sets and perform set intersection, then with the resulting set and the dict find out the indices of the identical rows. – Fractalism Jan 30 '23 at 17:35
  • Don't convert to a string. Ensure that the sparse classes are the same, form sets of tuples derived from the structure members, and perform set intersection. But none of this is reproducible with no data and no code. – Reinderien Jan 30 '23 at 19:17
  • Thank you for your comments. I've edited my question. Now, it has data and desired output. – Hamid Jan 31 '23 at 00:47

0 Answers0