0

Let's say I've created an integer 2d array:

import numpy as np
ar1 = np.random.randint(10, size=(4,2))
v1 = ar1[0]
v2 = [4,4]
ar1 = np.array([[5, 7],
                [7, 5],
                [9, 2],
                [0, 1]])

I want to check if v1 and v2 are elements of ar1. By 'elements' I mean 'rows':

v1 in ar1
v2 in ar1

And I get True in both cases. What am I doing wrong? Is there a better way to check if the vector matches a row of the array? Looping through rows (i.e. for rows in ar1:) is not an option.

EDIT: another way is to sum matching values in every row and check if the sum is 2, but it's lame and unpythonic

Alex
  • 944
  • 4
  • 15
  • 28

2 Answers2

2

You can use np.any with np.all with conditionals. The row you are checking for must be broadcastable - Array Broadcasting in numpy.

import numpy as np
v1 = np.array([9,2])
v2 = np.array([2,9])
v3 = np.array([9,4])
ar1 = np.array([[5, 7],
                [7, 5],
                [9, 2],
                [0, 1]])

>>> ar1 == v1
array([[False, False],
       [False, False],
       [ True,  True],
       [False, False]], dtype=bool)
>>> ar1 == v2
array([[False, False],
       [False, False],
       [False, False],
       [False, False]], dtype=bool)
>>> ar1 == v3
array([[False, False],
       [False, False],
       [ True, False],
       [False, False]], dtype=bool)
>>> np.any(np.all(ar1 == v1 , axis = 1)), np.any(np.all(ar1 == v2, axis = 1)), np.any(np.all(ar1 == v3, axis = 1))
(True, False, False)
>>>     
wwii
  • 23,232
  • 7
  • 37
  • 77
1

If they were lists instead of numpy arrays, this would work.

ar1 = np.array([[5, 7],
                [7, 5],
                [9, 2],
                [0, 1]])

[7, 5] in ar1 ## True
[7, 6] in ar1 ## True

[7, 5] in ar1.tolist() ## True
[7, 6] in ar1.tolist() ## False
Julien Spronck
  • 15,069
  • 4
  • 47
  • 55
  • @Shashank if your statement is true, can you explain the result I get then? – Julien Spronck Mar 29 '15 at 17:33
  • I'd be happy to remove my answer but I'd love an explanation – Julien Spronck Mar 29 '15 at 17:38
  • See this answer: http://stackoverflow.com/questions/14766194/testing-whether-a-numpy-array-contains-a-given-row . Basically, numpy's `in` does *not* do what you expect it to do for non-single-element searches. – aruisdante Mar 29 '15 at 17:39
  • Ok, thanks @aruisdante but the solution still seems valid (it is also suggested in the most popular answer on the post you linked to) – Julien Spronck Mar 29 '15 at 17:41
  • @JulienSpronck Yes, but `tolist()` is likely *O(n)*, and then you still have to perform another *O(n)* search for the `in` operator. At that point might as well just do a `for` on the original input matrix and save an iteration over the data.. – aruisdante Mar 29 '15 at 17:44
  • right, that's true ... probably not the fastest. Thanks again, @aruisdante – Julien Spronck Mar 29 '15 at 17:45
  • @JulienSpronck However, both `tolist()` and `in` (for `list`) are implemented in C, so it's possible that even though it requires two iterations over the data, it still winds up being faster for many real datasets. You'd have to benchmark to find out for sure. – aruisdante Mar 29 '15 at 17:47
  • @aruisdante ... for all I know, the data was converted from a list to a numpy array in the first place :-) – Julien Spronck Mar 29 '15 at 17:48