What is the fastest way to find intersection of two NumPy arrays while preserving order?

Question

I have two one-dimensional NumPy arrays, A and B, of the same length. I want to find the intersection of the two arrays, meaning I want to find all the elements of A that are also present in B.

The result should be a boolean array that is True when an element in array A at the index is also a member of array B, preserving the order so that I can use the result to index another array.

If not for the boolean mask constraint, I would convert both arrays to sets and use the set intersection operator (&). However, I have tried using np.isin and np.in1d, and found that using plain Python list comprehension is much faster.

Given the setup:

import numba
import numpy as np

primes = np.array([
    2,   3,   5,   7,  11,  13,  17,  19,  23,  29,  31,  37,  41,
    43,  47,  53,  59,  61,  67,  71,  73,  79,  83,  89,  97, 101,
    103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
    173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239,
    241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
    317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397,
    401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467,
    479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569,
    571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643,
    647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
    739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823,
    827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
    919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997],
    dtype=np.int64)

@numba.vectorize(nopython=True, cache=True, fastmath=True, forceobj=False)
def reverse_digits(n, base):
    out = 0
    while n:
        n, rem = divmod(n, base)
        out = out * base + rem
    return out

flipped = reverse_digits(primes, 10)

def set_isin(a, b):
    return a in b

vec_isin = np.vectorize(set_isin)

primes contains all prime numbers under 1000 with a total of 168. I chose it because it is of decent size and predetermined. I have performed various tests:

In [2]: %timeit np.isin(flipped, primes)
51.3 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [3]: %timeit np.in1d(flipped, primes)
46.2 µs ± 386 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit setp = set(primes)
12.9 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit setp = set(primes.tolist())
6.84 µs ± 175 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %timeit setp = set(primes.flat)
11.5 µs ± 54.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [7]: setp = set(primes.tolist())

In [8]: %timeit [x in setp for x in flipped]
23.3 µs ± 739 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit [x in setp for x in flipped.tolist()]
12.1 µs ± 76.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [10]: %timeit [x in setp for x in flipped.flat]
19.7 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [11]: %timeit vec_isin(flipped, setp)
40 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [12]: %timeit np.frompyfunc(lambda x: x in setp, 1, 1)(flipped)
25.7 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [13]: %timeit setf = set(flipped.tolist())
6.51 µs ± 44 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [14]: setf = set(flipped.tolist())

In [15]: %timeit np.array(sorted(setf & setp))
9.42 µs ± 78.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

setp = set(primes.tolist()); [x in setp for x in flipped.tolist()] takes about 19 microseconds, which is faster than NumPy methods. I am wondering why this is the case, and if there is a way to make it even faster.

(I wrote all the code, and I used AI suggested edit feature to edit the question)

`numpy.isin` is designed to run faster for much larger arrays. For example, for primes up to 10,000 (1,229 elements), `numpy.isin` should run twice as fast. So this is a very important question: is your `primes` really a decent size for you? (I.e., do you really want to use it on such small arrays?) — ken, Jun 18 '23 at 07:47

score 3 · Accepted Answer · answered Jun 18 '23 at 12:37

Why the provided solutions are not efficient

np.isin has two implementation. The first consists in sorting the two arrays (using a merge-sort) and then merge them. This solution runs in O(n log n + m log m + n+m) that is O(n log n + m log m). The other implementation is based on a lookup table. This second implementation create an array of boolean value based on the second array and then check if lookupTable[item] is set for each item of the first array. This second implementation can be faster for arrays containing small integers (this is a bit more complicated but explained in the documentation). This second solution runs in O(n + m + max(arr2)) (and even theoretically O(n + m) on some platforms with a big hidden constant). However, it can use much more memory. Numpy try to pick the best one by default. In your case, the two arrays are small and the integers inside are also relatively small so the two solution are relatively fast. For bigger arrays with small integers, the lookup table should be faster.

The thing is Numpy is not efficient here because the overhead of calling a Numpy function like this is relatively big compared to the actual computation. Besides, the second array is already sorted so sorting it again is not efficient.

Faster implementation

One could just use a binary search to find the value of the first array in the second one without allocating any additional temporary array for exemple. You can use Numba so to reduce the overhead of calling Numpy several functions on small arrays and even fill the result faster using a jitted loop. Here is the final implementation:

# Assume primes is sorted
@numba.njit('bool_[:](int64[:],int64[:])')
def compute(flipped, primes):
    assert primes.size > 0 and primes.size == flipped.size
    res = np.empty(flipped.size, dtype=np.bool_)
    idx = np.searchsorted(primes, flipped)
    for i in range(res.size):
        if idx[i] < len(primes) and primes[idx[i]] == flipped[i]:
            res[i] = True
        else:
            res[i] = False
    return res

This solution is 15 times faster than np.isin(flipped, primes) on my machine, and faster than all other alternative (by a significant margin). It only takes about 2 µs on the provided input. It also scale relatively well.

Fastest solution for large arrays

For huge arrays, using a lookup table should be faster since the above solution runs in O(n log m) time while a lookup table implementation can run in linear time here. That being said, a lookup table also use significantly more memory. The best approach is to use a Bloom filter to make the lookup table much more compact (thanks to hashing). However, this solution is significantly more complex to implement. There is an exemple here for setdif1d. Fastest solutions often comes at the price of a significantly more complex code (there is no free lunch).

What is the fastest way to find intersection of two NumPy arrays while preserving order?

1 Answers1

Why the provided solutions are not efficient

Faster implementation

Fastest solution for large arrays