47

Anyone ever come up to this problem? Let's say you have two arrays like the following

a = array([1,2,3,4,5,6])
b = array([1,4,5])

Is there a way to compare what elements in a exist in b? For example,

c = a == b # Wishful example here
print c
array([1,4,5])
# Or even better
array([True, False, False, True, True, False])

I'm trying to avoid loops as it would take ages with millions of elements. Any ideas?

Cheers

ebressert
  • 2,319
  • 4
  • 21
  • 27

6 Answers6

62

Actually, there's an even simpler solution than any of these:

import numpy as np

a = array([1,2,3,4,5,6])
b = array([1,4,5])

c = np.in1d(a,b)

The resulting c is then:

array([ True, False, False,  True,  True, False], dtype=bool)
eteq
  • 960
  • 7
  • 6
  • 6
    Is there an "almost_equal" version of this? Where you can specify the condition used to test for equality? – endolith Feb 09 '13 at 01:44
  • 5
    This is now deprecated. The NumPy documentation states "We recommend using `isin` instead of `in1d` for new code." – divenex May 15 '18 at 16:13
24

Use np.intersect1d.

#!/usr/bin/env python
import numpy as np
a = np.array([1,2,3,4,5,6])
b = np.array([1,4,5])
c=np.intersect1d(a,b)
print(c)
# [1 4 5]

Note that np.intersect1d gives the wrong answer if a or b have nonunique elements. In that case use np.intersect1d_nu.

There is also np.setdiff1d, setxor1d, setmember1d, and union1d. See Numpy Example List With Doc

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
3

Thanks for your reply kaizer.se. It's not quite what I was looking for, but with a suggestion from a friend and what you said I came up with the following.

import numpy as np

a = np.array([1,4,5]).astype(np.float32)
b = np.arange(10).astype(np.float32)

# Assigning matching values from a in b as np.nan
b[b.searchsorted(a)] = np.nan

# Now generating Boolean arrays
match = np.isnan(b)
nonmatch = match == False

It's a bit of a cumbersome process, but it beats writing loops or using weave with loops.

Cheers

u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
ebressert
  • 2,319
  • 4
  • 21
  • 27
  • Problem with this approach is that it returns indexes even for values in `a` that don't exist in `b` (as well as duplicate indexes in other cases). For example: `numpy.searchsorted([1, 2], [1.2, 1.3])` returns `[1, 1]` which is not suitable for the OP. – sirfz Jun 16 '16 at 16:39
3

Numpy has a set function numpy.setmember1d() that works on sorted and uniqued arrays and returns exactly the boolean array that you want. If the input arrays don't match the criteria you'll need to convert to the set format and invert the transformation on the result.

import numpy as np
a = np.array([6,1,2,3,4,5,6])
b = np.array([1,4,5])

# convert to the uniqued form
a_set, a_inv = np.unique1d(a, return_inverse=True)
b_set = np.unique1d(b)
# calculate matching elements
matches = np.setmea_set, b_set)
# invert the transformation
result = matches[a_inv]
print(result)
# [False  True False False  True  True False]

Edit: Unfortunately the setmember1d method in numpy is really inefficient. The search sorted and assign method you proposed works faster, but if you can assign directly you might as well assign directly to the result and avoid lots of unnecessary copying. Also your method will fail if b contains anything not in a. The following corrects those errors:

result = np.zeros(a.shape, dtype=np.bool)
idxs = a.searchsorted(b)
idxs = idxs[np.where(idxs < a.shape[0])] # Filter out out of range values
idxs = idxs[np.where(a[idxs] == b)] # Filter out where there isn't an actual match
result[idxs] = True
print(result)

My benchmarks show this at 91us vs. 6.6ms for your approach and 109ms for numpy setmember1d on 1M element a and 100 element b.

Ants Aasma
  • 53,288
  • 15
  • 90
  • 97
  • That's a nice solution. I'll try out your suggestion and what I just wrote to see what's more optimal in speed. Many thanks everyone for your help! – ebressert Oct 23 '09 at 13:56
  • The method I wrote is a bit faster. For a 10000 element array the time it took using timeit in iPython is roughly 3 µs. The setmember1d method took 3 ms. I think your method is more elegant, but I need the speed. – ebressert Oct 23 '09 at 14:06
  • you forgot to close a parenthesis in the 3rd line. you should fix it before some computer science professor notices it... – dalloliogm Oct 23 '09 at 15:09
  • ebressert: seems that you're right, setmember1d has an absolutely terrible implementation in numpy. But the method you're using seems to be using nan values for no good reason, you might just as well use the result array directly. I'll edit with the corresponding example. – Ants Aasma Oct 23 '09 at 15:27
  • Ants Aasma: Your edit is good. I implemented pieces of it to my code and increased the speed once more. Rather than doing nans I put in -1 and then filtered on match = b >= 0. I'm dealing with indexing in my case so there are no indexes of -1. That's why I used np.nan which would work for the more general case. Thanks for your input. My code is really flying now. – ebressert Oct 23 '09 at 16:15
0

ebresset, your answer won't work unless a is a subset of b (and a and b are sorted). Otherwise the searchsorted will return false indices. I had to do something similar, and combining that with your code:

# Assume a and b are sorted
idxs = numpy.mod(b.searchsorted(a),len(b))
idxs = idxs[b[idxs]==a]
b[idxs] = numpy.nan
match = numpy.isnan(b)
Community
  • 1
  • 1
AFoglia
  • 7,968
  • 3
  • 35
  • 51
-3

Your example implies set-like behavior, caring more about existance in the array than having the right element at the right place. Numpy does this differently with its mathematical arrays and matrices, it will tell you only about items at the exact right spot. Can you make that work for you?

>>> import numpy
>>> a = numpy.array([1,2,3])
>>> b = numpy.array([1,3,3])
>>> a == b
array([ True, False,  True], dtype=bool)
u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
  • sorry, this example doesn't work if you try it; moreover, you would have to sort the arrays first. – dalloliogm Oct 23 '09 at 14:18
  • @dalloligom: Uh, I copied from my interactive session so at least it works exactly like that for some version of Python and Numpy. – u0b34a0f6ae Oct 23 '09 at 14:46
  • ok, but it doesn't work if the two arrays have different length; in any case, you have to sort them first (try array([1,2,3])==array([2,3,1]). he wants to know which elements of an array exists in another. – dalloliogm Oct 23 '09 at 15:07
  • and by the way, even sorting the arrays won't work... you have to use a set structure. – dalloliogm Oct 23 '09 at 15:11
  • @dalloliogm: Did you read my answer? Does it seem like I didn't understand all that? – u0b34a0f6ae Oct 23 '09 at 18:02