Intersecting rows across 2 varied sized numpy arrays

Question

Let's assume I have 2 numpy arrays as;

arr1 = np.array([
       [1, 2, 3, 4],
       [2, 3, 1, 4],
       [2, 4, 1, 5],
       ...)

arr2 = np.array([
       [2, 4, 1, 5],
       [2, 1, 3, 5],
       [1, 2, 3, 4],
       ...)

I'd like to get an intersect of rows for arr1 and arr2. I have tried:

intersect = np.intersect1d(arr1, arr2)

and it returns

array([1, 2, 3, 4, 5])

Which means intersection elementwise. I'd like to accomplish this rowwise. It should return as;

array(
      [2, 4, 1, 5],
      [1, 2, 3, 5])

Does this answer your question? [Get intersecting rows across two 2D numpy arrays](https://stackoverflow.com/questions/8317022/get-intersecting-rows-across-two-2d-numpy-arrays) — Suraj, Jul 11 '20 at 14:04
@SurajSubramanian Actually I checked beforehand menitoned answers. Accepted answer doesn't works in my case since array sizes should be equal. Mentioned pure python solution doesn't works either. — colt.exe, Jul 11 '20 at 14:20
Some of the answers in the link assume matching sizes, but not all. — hpaulj, Jul 11 '20 at 15:22

hpaulj · Accepted Answer · 2020-07-11T16:00:05.563

In [3]: arr1 = np.array([ 
   ...:        [1, 2, 3, 4], 
   ...:        [2, 3, 1, 4], 
   ...:        [2, 4, 1, 5], 
   ...:        ]) 
   ...:  
   ...: arr2 = np.array([ 
   ...:        [2, 4, 1, 5], 
   ...:        [2, 1, 3, 5], 
   ...:        [1, 2, 3, 4], 
   ...:        ])

broadcasted equality followed by the appropriate mix of all and any:

In [8]: (arr1[:,None,:]==arr2[None,:,:]).shape                                                       
Out[8]: (3, 3, 4)
In [9]: (arr1[:,None,:]==arr2[None,:,:]).all(axis=2)                                                 
Out[9]: 
array([[False, False,  True],
       [False, False, False],
       [ True, False, False]])
In [10]: (arr1[:,None,:]==arr2[None,:,:]).all(axis=2).any(axis=0)                                    
Out[10]: array([ True, False,  True])

In [12]: arr1[_]                                                                                     
Out[12]: 
array([[1, 2, 3, 4],
       [2, 4, 1, 5]])

With sets

In [19]: set([tuple(row) for row in arr1])                                                           
Out[19]: {(1, 2, 3, 4), (2, 3, 1, 4), (2, 4, 1, 5)}
In [20]: set([tuple(row) for row in arr2])                                                           
Out[20]: {(1, 2, 3, 4), (2, 1, 3, 5), (2, 4, 1, 5)}
In [21]: _19.intersection(_20)                                                                       
Out[21]: {(1, 2, 3, 4), (2, 4, 1, 5)}

===

If I expand arr2 to 4 rows:

...: arr2 = np.array([ 
...:        [2, 4, 1, 5], 
...:        [2, 1, 3, 5], 
...:        [1, 2, 3, 4], 
...:        [1, 1, 1, 1], 
...:        ]) 

In [34]: (arr1[:,None,:]==arr2[None,:,:]).all(axis=2).any(axis=0)                                    
Out[34]: array([ True, False,  True, False])

any on 0 produces a 4 element array, which has to be used to index arr2 (not arr1 as I originally did):

In [35]: arr2[_]                                                                                     
Out[35]: 
array([[2, 4, 1, 5],
       [1, 2, 3, 4]])

Or any along the other axis:

In [36]: (arr1[:,None,:]==arr2[None,:,:]).all(axis=2).any(axis=1)                                    
Out[36]: array([ True, False,  True])
In [37]: arr1[_]                                                                                     
Out[37]: 
array([[1, 2, 3, 4],
       [2, 4, 1, 5]])

The all produces (in this case) a (3,4) array:

In [38]: (arr1[:,None,:]==arr2[None,:,:]).all(axis=2)                                                
Out[38]: 
array([[False, False,  True, False],
       [False, False, False, False],
       [ True, False, False, False]])

any can reduce either rows or columns.

`(arr1[:,None,:]==arr2[None,:,:]).shape ` returns `(620,593,70)`. Following this inconsistency `In [12]` yields the following error `IndexError: boolean index did not match indexed array along dimension 0; dimension is 620 but corresponding boolean dimension is 593`. The varied size of np arrays blows it I guess. However sets solution is working fine. — colt.exe, Jul 11 '20 at 15:32
If the boolean is (593,) it should index the array with 593 rows. With equal size it didn't matter, so I was a bit sloppy about the pairing. With correct pairing, that approach still works. See my edit. — hpaulj, Jul 11 '20 at 16:02
I'm really greatful for your efforts @hpaulj. Keep up the good spirit. Nice informative all-around answer. — colt.exe, Jul 11 '20 at 16:05

Intersecting rows across 2 varied sized numpy arrays

1 Answers1