36

I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A. Now I am looking for a way to get the index of B's values within A.

For example:

A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
Divakar
  • 218,885
  • 19
  • 262
  • 358
farhawa
  • 10,120
  • 16
  • 49
  • 91

4 Answers4

42

You can use np.in1d with np.nonzero -

np.nonzero(np.in1d(A,B))[0]

You can also use np.searchsorted, if you care about maintaining the order -

np.searchsorted(A,B)

For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -

sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]

I would add in my favorite broadcasting too in the mix to solve a generic case -

np.nonzero(B[:,None] == A)[1]

Sample run -

In [125]: A
Out[125]: array([ 7,  5,  1,  6, 10,  9,  8])

In [126]: B
Out[126]: array([ 1, 10,  7])

In [127]: sort_idx = A.argsort()

In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])

In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    This assumes we don't care about order; it returns the same results for B=[1,7,10] as for [1, 10, 7]. (The OP's question doesn't specify either way.) – DSM Nov 12 '15 at 18:24
  • @DSM Helpful comments! Thanks! Made it generic. – Divakar Nov 12 '15 at 18:43
  • 1
    the docs suggest to use now [`isin`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isin.html#numpy.isin) instead of `in1d` – bobrobbob Jun 25 '18 at 15:58
11

Have you tried searchsorted?

A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])

A.searchsorted(B)
# array([0, 6, 9])
Bi Rico
  • 25,283
  • 3
  • 52
  • 75
2

Just for completeness: If the values in A are non negative and reasonably small:

lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices  = lookup[B]
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
0

I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.

As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.

I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.

In [1]: import numpy as np
In [2]: from numba import njit

In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])

In [5]: np.where(np.in1d(a, b))[0]
   ...: array([0, 6, 9])

In [6]: np.nonzero(np.in1d(a, b))[0]
   ...: array([0, 6, 9])   

In [7]: np.searchsorted(a, b)
   ...: array([0, 6, 9])


In [8]: np.searchsorted(a, np.intersect1d(a, b))
   ...: array([0, 6, 9])

In [9]: [i for i, x in enumerate(a) if x in b]
   ...: [0, 6, 9]

In [10]: @njit
    ...: def func(a, b):
    ...:     return [i for i, x in enumerate(a) if x in b]

In [11]: func(a, b)
   ...: [0, 6, 9]

Now, let's compare the timing performance of these solutions.

In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Mohammadreza Riahi
  • 502
  • 1
  • 6
  • 20