Check if each element in a numpy array is in another array

Question

This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B.

So, if

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])

Currently I am using

C = np.searchsorted(A,B)

which takes advantage of the fact that A is in order, and gives me [1, 3, 5], the indices of the elements that are in A. This is great, but how do I get D = [0,2,4,6], the indices of elements of A that are not in B?

score 40 · Answer 1 · edited Feb 11 '14 at 21:09

searchsorted may give you wrong answer if not every element of B is in A. You can use numpy.in1d:

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6,8])
mask = np.in1d(A, B)
print np.where(mask)[0]
print np.where(~mask)[0]

output is:

[1 3 5]
[0 2 4 6]

However in1d() uses sort, which is slow for large datasets. You can use pandas if your dataset is large:

import pandas as pd
np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

Here is the time comparison:

A = np.random.randint(0, 1000, 10000)
B = np.random.randint(0, 1000, 10000)

%timeit np.where(np.in1d(A, B))[0]
%timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

output:

100 loops, best of 3: 2.09 ms per loop
1000 loops, best of 3: 594 µs per loop

It's good to know about this efficient method because my datasets are very large. Thanks so much for this solution! — DanHickstein, Apr 11 '13 at 22:05

score 8 · Accepted Answer · answered Apr 11 '13 at 02:40

8

import numpy as np

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])
C = np.searchsorted(A, B)

D = np.delete(np.arange(np.alen(A)), C)

D
#array([0, 2, 4, 6])

answered Apr 11 '13 at 02:40

askewchan

45,161
17
118
134

1

Thanks! I also like the answer provided by alexhb using np.setdiff1d. I was hoping that there was a function that would give me the indices directly, but this works just fine. – DanHickstein Apr 11 '13 at 02:54
There might be, @Dan, but I can't think of it. If you don't need `C`, use his solution, but mine will be twice as fast if you've already got `C`. – askewchan Apr 11 '13 at 02:55

score 7 · Answer 3 · answered Apr 11 '13 at 02:48

7

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([2, 4, 6])
c = np.searchsorted(a, b)
d = np.searchsorted(a, np.setdiff1d(a, b))

d
#array([0, 2, 4, 6])

answered Apr 11 '13 at 02:48

alexhb

435
2
12

Having to search twice slows this down a bit, better to use the already known `C` to get `D`. But, this is of course the better solution if `C` is not needed, so +1. (Welcome to [SO]!) – askewchan Apr 11 '13 at 02:53
should the `c` line be deleted? it is not doing anything here – crypdick Mar 20 '23 at 16:27

score 5 · Answer 4 · edited Jun 20 '20 at 09:12

5

The elements of A that are also in B:

set(A) & set(B)

The elements of A that are not in B:

set(A) - set(B)

edited Jun 20 '20 at 09:12

Community

1
1

answered May 11 '17 at 15:22

Ben Zweig

59
1
2

This does not answer the question (to get indexes, not elements). However, if you want to perform above operation for numpy, do not convert it to set, but use numpy operations instead. See [intersect1d](https://numpy.org/doc/stable/reference/generated/numpy.intersect1d.html?highlight=intersect1d#numpy.intersect1d) and [setdiff1d](https://numpy.org/doc/stable/reference/generated/numpy.setdiff1d.html) (or eventually [setxor1d](https://numpy.org/doc/stable/reference/generated/numpy.setxor1d.html#numpy.setxor1d)). – Nerxis Aug 18 '20 at 13:51
Thank you, as I was looking for elements not indices and the question title is ambiguous. I appreciate the numpy operations as well. – PhasorLaser Feb 18 '22 at 22:15

score 0 · Answer 5 · answered Mar 20 '23 at 16:52

all_vals = np.arange(1000)  # `A` in the question
seen_vals = np.unique(np.random.randint(0, 1000, 100))  # `B` in the question
# indices of unseen values
mask = np.isin(all_vals, seen_vals, invert=True)  # `D` in the original question
unseen_vals = all_vals[mask]

Check if each element in a numpy array is in another array

5 Answers5

The elements of A that are also in B:

The elements of A that are not in B:

Linked

Related