True returned in both cases, when it contains an element and when not. Why?

Question

Let's say I've created an integer 2d array:

import numpy as np
ar1 = np.random.randint(10, size=(4,2))
v1 = ar1[0]
v2 = [4,4]
ar1 = np.array([[5, 7],
                [7, 5],
                [9, 2],
                [0, 1]])

I want to check if v1 and v2 are elements of ar1. By 'elements' I mean 'rows':

v1 in ar1
v2 in ar1

And I get True in both cases. What am I doing wrong? Is there a better way to check if the vector matches a row of the array? Looping through rows (i.e. for rows in ar1:) is not an option.

EDIT: another way is to sum matching values in every row and check if the sum is 2, but it's lame and unpythonic

ar2 is not defined here, so for all we know, v2 is actually in ar2 — Julien Spronck, Mar 29 '15 at 17:24
*"looping through rows... is not an option"* what do you think the `in` operation does in the case of an unstructured array/list? — aruisdante, Mar 29 '15 at 17:30

wwii · Accepted Answer · 2015-03-29T17:59:56.233

2

You can use np.any with np.all with conditionals. The row you are checking for must be broadcastable - Array Broadcasting in numpy.

import numpy as np
v1 = np.array([9,2])
v2 = np.array([2,9])
v3 = np.array([9,4])
ar1 = np.array([[5, 7],
                [7, 5],
                [9, 2],
                [0, 1]])

>>> ar1 == v1
array([[False, False],
       [False, False],
       [ True,  True],
       [False, False]], dtype=bool)
>>> ar1 == v2
array([[False, False],
       [False, False],
       [False, False],
       [False, False]], dtype=bool)
>>> ar1 == v3
array([[False, False],
       [False, False],
       [ True, False],
       [False, False]], dtype=bool)
>>> np.any(np.all(ar1 == v1 , axis = 1)), np.any(np.all(ar1 == v2, axis = 1)), np.any(np.all(ar1 == v3, axis = 1))
(True, False, False)
>>>

edited Mar 29 '15 at 17:59

answered Mar 29 '15 at 17:53

wwii

23,232
7
37
77

Neat, this works well – Alex Mar 31 '15 at 19:33
@Alex, Do you understand how it works - so you can adapt it? – wwii Apr 01 '15 at 03:46
in fact the absolutely best way is to use set(): then you can use a in b (set), and it does exactly what I want – Alex Apr 01 '15 at 10:06
@Alex if you think that is the best solution,you could post it as an answer with examples - it's OK to answer your own question. – wwii Apr 07 '15 at 03:50

Julien Spronck · Answer 2 · 2015-03-29T17:29:41.633

1

If they were lists instead of numpy arrays, this would work.

ar1 = np.array([[5, 7],
                [7, 5],
                [9, 2],
                [0, 1]])

[7, 5] in ar1 ## True
[7, 6] in ar1 ## True

[7, 5] in ar1.tolist() ## True
[7, 6] in ar1.tolist() ## False

edited Mar 29 '15 at 17:29

answered Mar 29 '15 at 17:27

Julien Spronck

15,069
4
47
55

@Shashank if your statement is true, can you explain the result I get then? – Julien Spronck Mar 29 '15 at 17:33
I'd be happy to remove my answer but I'd love an explanation – Julien Spronck Mar 29 '15 at 17:38
See this answer: http://stackoverflow.com/questions/14766194/testing-whether-a-numpy-array-contains-a-given-row . Basically, numpy's `in` does *not* do what you expect it to do for non-single-element searches. – aruisdante Mar 29 '15 at 17:39
Ok, thanks @aruisdante but the solution still seems valid (it is also suggested in the most popular answer on the post you linked to) – Julien Spronck Mar 29 '15 at 17:41
@JulienSpronck Yes, but `tolist()` is likely *O(n)*, and then you still have to perform another *O(n)* search for the `in` operator. At that point might as well just do a `for` on the original input matrix and save an iteration over the data.. – aruisdante Mar 29 '15 at 17:44
right, that's true ... probably not the fastest. Thanks again, @aruisdante – Julien Spronck Mar 29 '15 at 17:45
@JulienSpronck However, both `tolist()` and `in` (for `list`) are implemented in C, so it's possible that even though it requires two iterations over the data, it still winds up being faster for many real datasets. You'd have to benchmark to find out for sure. – aruisdante Mar 29 '15 at 17:47
@aruisdante ... for all I know, the data was converted from a list to a numpy array in the first place :-) – Julien Spronck Mar 29 '15 at 17:48

True returned in both cases, when it contains an element and when not. Why?

2 Answers2