5

for example if i have:

import numpy as np
A = np.array([[2,3,4],[5,6,7]])

and i want to check if the following list is the same as one of the lists that the array consist of:

B = [2,3,4]

I tried

B in A #which returns True

But the following also returns True, which should be false:

B = [2,2,2]
B in A

6 Answers6

6

Try this generator comprehension. The builtin any() short-circuits so that you don't have extra evaluations that you don't need.

any(np.array_equal(row, B) for row in A)

For now, np.array_equal doesn't implement internal short-circuiting. In a different question the performance impact of different ways of accomplishing this is discussed.

As @Dan mentions below, broadcasting is another valid way to solve this problem, and it's often (though not always) a better way. For some rough heuristics, here's how you might want to choose between the two approaches. As with any other micro-optimization, benchmark your results.

Generator Comprehension

  • Reduced memory footprint (not creating the array B==A)
  • Short-circuiting (if the first row of A is B, we don't have to look at the rest)
  • When rows are large (definition depends on your system, but could be ~100 - 100,000), broadcasting isn't noticeably faster.
  • Uses builtin language features. You have numpy installed anyway, but I'm partial to using the core language when there isn't a reason to do otherwise.

Broadcasting

  • Fastest way to solve an extremely broad range of problems using numpy. Using it here is good practice.
  • If we do have to search through every row in A (i.e. if more often than not we expect B to not be in A), broadcasting will almost always be faster (not always a lot faster necessarily, see next point)
  • When rows are smallish, the generator expression won't be able to vectorize the computations efficiently, so broadcasting will be substantially faster (unless of course you have enough rows that short-circuiting outweighs that concern).
  • In a broader context where you have more numpy code, the use of broadcasting here can help to have more consistent patterns in your code base. Coworkers and future you will appreciate not having a mix of coding styles and patterns.
Hans Musgrave
  • 6,613
  • 1
  • 18
  • 37
  • "it looks like B in A is being interpreted as np.isin(B, A).all()" I don't think so, try [1,3,1] for example. I think it is checking each column and returns true if the number is in that column of any row. – Dan Oct 29 '18 at 15:18
  • 1
    You're right that we don't really need a *generator* comprehension, but that depends on the anticipated workload. When there are largeish rows the performance difference is negligible other than the comprehension having reduced memory consumption, and the early stopping can allow this to be substantially faster if we would typically expect `A` to contain `B` in the sense OP expects. – Hans Musgrave Oct 29 '18 at 15:22
  • "Doesn't require a conversion of B to an array " -- turns out the broadcasting is happy with lists as well as arrays. I don't know if it takes a performance hit or not though – Dan Oct 29 '18 at 15:57
  • You're right. I was getting a weird result with the `==` where instead of returning an array it was returning a boolean. I attributed it to broadcasting acting up, but it was a shape mismatch. – Hans Musgrave Oct 29 '18 at 16:17
  • The short circuiting with the generator comprehension is interesting. I wonder if numpy is smart enough to do something similar under the hood. But it's definitely something I'll keep in mind in the future. – Dan Oct 29 '18 at 16:26
  • 1
    @Dan Last I checked it isn't short-circuited internally, and there were discussions about whether it was needed or not. Some other SO questions discussed using things like numba to speed speed up checks like this and allow short-circuiting. – Hans Musgrave Oct 29 '18 at 18:54
3

You can do it by using broadcasting like this:

import numpy as np
A = np.array([[2,3,4],[5,6,7]])
B = np.array([2,3,4]) # Or [2,3,4], a list will work fine here too

(B==A).all(axis=1).any()
Dan
  • 45,079
  • 17
  • 88
  • 157
1

Using the built-in any. As soon as an identical element is found, it stops iterating and returns true.

import numpy as np

A = np.array([[2,3,4],[5,6,7]])
B = [3,2,4]

if any(np.array_equal(B, x) for x in A):
  print(f'{B} inside {A}')
else:
  print(f'{B} NOT inside {A}')
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
0

You need to use .all() for comparing all the elements of list.

A = np.array([[2,3,4],[5,6,7]])
B = [2,3,4]

for i in A:
    if (i==B).all():
        print ("Yes, B is present in A")
        break

EDIT: I put break to break out of the loop as soon as the first occurence is found. This applies to example such as A = np.array([[2,3,4],[2,3,4]])

# print ("Yes, B is present in A")

Alternative solution using any:

any((i==B).all() for i in A)

# True
Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • Suppose A was `np.array([[2,3,4],[2,3,4]])`. Then this will print `Yes, B is present in A` twice. Is there any way to make it print only once? – Kevin Oct 29 '18 at 15:08
  • @Kevin: You can put a `break` as soon as it is present – Sheldore Oct 29 '18 at 15:08
  • @Kevin wrap in an `np.any()`. Also there is no need for a loop here, just use broadcasting – Dan Oct 29 '18 at 15:09
0
list((A[[i], :]==B).all() for i in range(A.shape[0])) 

[True, False]

This will tell you what row of A is equal to B

Khalil Al Hooti
  • 4,207
  • 5
  • 23
  • 40
0

Straight forward, you could use any() to go through a generator comparing the arrays with array_equal.

from numpy import array_equal
import numpy as np

A = np.array([[2,3,4],[5,6,7]])
B = np.array([2,2,4]) 

in_A = lambda x, A : any((array_equal(a,x) for a in A))

print(in_A(B, A))
False

[Program finished] 
Subham
  • 397
  • 1
  • 6
  • 14