Here is an attempt that doesn't use any iteration, and switches the membership checking to np.isin()
:
>>> arr2[np.isin(arr2, arr1[np.isin(arr1, arr2)])][:cap]
array([3, 1, 2])
Since this is arguably harder to read than your list comprehension, here are the steps broken down:
# find elements of arr1 in arr2
subset_arr1 = arr1[np.isin(arr1, arr2)] # array([3, 7, 1, 2])
# find the elements of arr2 in subset1, this gets the desired order
subset_arr2 = arr2[np.isin(arr2, subset_arr1)] # array([3, 1, 2, 7])
# take the first cap elements
ans = subset_arr2[:cap] # array([3, 1, 2])
Timing
My original testing indicated my np.isin()
approach to be the fastest. However, Georgy since updated their answer (the for
loop) with a bug fix. The new testing then indicated Georgy's answer to be fastest for all tested data sizes:

"np.isin" is my answer, "listcomp" is your original solution, "forloop" is the other existing answer. In the above, the size of N (x-axis) is the length of arr1
, arr2
is double that length, and the cap
is fixed at 5
(I thought this seemed reasonable, but see below for tweaks).
It seems the ability to stop early by the for loop can be very beneficial! In my answer, the code has to check all elements of arr1
for membership in arr2
(subset_arr1
above), and then all elements of arr2
for membership in that subset. Georgy's loop OTOH only has to check for membership of some of the elements of arr2
in arr1
. So even though my looping should be faster, there can be much more of it!
The only parameter I tried fiddling with was the cap
, which shows how smaller cap
sizes are better for the for loop (smaller cap -> less iteration). The above was cap == 5
; with cap == 100
:

And cap == 1000
:

I reckon the other parameters could influence things as well (e.g. how likely it is for the is-in operation to return true AKA the range of numerical values, or the relative sizes of the two arrays). I have made the perfplot code below more parameterized if anyone wants to test - I'm tired of thinking about complexity :
import numpy as np
import perfplot
cap = 5
minnumber = 1
maxnumber = 10
factor1 = 1 # how size of arr1 relates to n
factor2 = 2 # how size of arr2 relates to n
low_n = 3
high_n = 18
def make_data(n):
arr1 = np.random.randint(minnumber, maxnumber, n*factor1)
arr2 = np.random.randint(minnumber, maxnumber, n*factor2)
return arr1, arr2
def listcomp(arr1, arr2):
return np.array([el for el in arr2 if el in arr1][:cap])
def npisin(arr1, arr2):
return arr2[np.isin(arr2, arr1[np.isin(arr1, arr2)])][:cap]
def forloop(arr1, arr2):
capcopy = cap
lst=[]
for el in arr2:
if el in arr1:
lst.append(el)
capcopy -=1
if capcopy <1:
break
return np.array(lst)
perfplot.show(
setup=lambda n : make_data(n),
kernels = [
lambda x: listcomp(*x),
lambda x: npisin(*x),
lambda x: forloop(*x),
],
labels=['listcomp', 'np.isin', 'forloop'],
n_range=[2 ** k for k in range(low_n, high_n)])