Check how many numpy array within a numpy array are equal to other numpy arrays within another numpy array of different size

Question

My problem

Suppose I have

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)

I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!

How can I do that?

My Try

count = 0
for bitem in b:
     for aitem in a:
         if aitem==bitem:
               count+=1

Is there a better way? Especially in one line, maybe with some comprehension..

Got to upvote for the title alone – Mark Baijens Aug 29 '17 at 10:02 — Mark Baijens, Aug 29 '17 at 10:02
@thanks man, appreciate that – Euler_Salter Aug 29 '17 at 10:07 — Euler_Salter, Aug 29 '17 at 10:07

score 3 · Accepted Answer · answered Aug 29 '17 at 21:25

3

The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:

import numpy_indexed as npi
count = len(npi.intersection(a, b))

Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:

count = npi.in_(b, a).sum()

Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.

answered Aug 29 '17 at 21:25

Eelco Hoogendoorn

10,459
1
44
42

Do you think these two solutions are faster than the ones described above? – Euler_Salter Aug 30 '17 at 09:09
2

Those methods are quadratic in both memory and computation; so for anything more than a few array elements, definitely – Eelco Hoogendoorn Aug 30 '17 at 09:39

jdehesa · Answer 2 · 2017-08-29T11:30:03.320

2

Here is a simple way to do it:

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

count = np.count_nonzero(
    np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))

print(count)
>>> 2

edited Aug 29 '17 at 11:30

answered Aug 29 '17 at 10:08

jdehesa

58,456
7
77
121

First, shouldn't `count` be 2? There's only 2 matching arrays in the input. You're counting matching elements (5, 6, 1, 2, and 3) instead of matching arrays ((5,6) and (1,2)) – Daniel F Aug 29 '17 at 10:54
Also, `np.logical_or.reduce(.., axis = 0)` is equivalent to `np.any( . . ., axis = 0)`, and using `np.count_nonzero` on a boolean array is wasteful compared to a simple `sum()` – Daniel F Aug 29 '17 at 10:55
@DanielF You're right, I misread the question, I thought it was about common elements in general, not subarrays. `count_nonzero` is significantly faster that `sum` here, though (check `%timeit (np.random.rand(10000000) > .5).sum()` and `%timeit np.count_nonzero(np.random.rand(10000000) > .5)` in IPython). – jdehesa Aug 29 '17 at 11:26

Mohamed Ali JAMAOUI · Answer 3 · 2017-08-30T07:31:05.523

You can do what you want in one liner as follows:

count = sum([np.array_equal(x,y) for x,y in product(a,b)])

Explanation

Here's an explanation of what's happening:

Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
True is equal to 1 when using sum on a list

Full example:

The final code looks like this:

import numpy as np 
from itertools import product 
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2

Daniel F · Answer 4 · 2017-08-29T11:00:33.120

You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays

def void_arr(a):
    return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1]))) 

b[np.in1d(void_arr(b), void_arr(a))]

array([[5, 6],
       [1, 2]])

If you just want the number of intersections, it's

np.in1d(void_arr(b), void_arr(a)).sum()

2

Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)

For more information, see the third answer here

Check how many numpy array within a numpy array are equal to other numpy arrays within another numpy array of different size

4 Answers4

Explanation

Full example:

Linked