Question description
Lets say we have two simple arrays:
query = np.array([100, 4000, 500, 700, 400, 100])
match = np.array([6, 100, 4000, 100, 10, 8, 10])
I want to find the indexes of all matching values between the query and match. So in this case the result would be:
value query match
100 0 1
100 0 3
100 5 1
100 5 3
4000 1 2
In reality these arrays will contain millions of items
"Stupid" loop solution
qs = []
query_locs = []
match_locs = []
for i in np.arange(query.size):
q = query[i]
# Get matching indexes in "match"
match_loc = np.where(match == q)[0]
n = match_loc.size
# Update location arrays
match_locs.extend(match_loc)
query_locs.extend(np.repeat(i,n))
# Store the matching value
qs.extend(np.repeat(q,n))
result = np.vstack((qs, query_locs, match_locs)).T
print(result)
[[ 100 0 1]
[ 100 0 3]
[4000 1 2]
[ 100 5 1]
[ 100 5 3]]
(Maybe numba
could make this loop pretty fast however when I tried this I got some errors about the signatures, so not sure about that)
Numpy buildins
There are quite some buildin numpy function to solve this problem for unique values, like using searchsorted
, intersect1d
, however, as also described in the doc, they "Return the sorted, unique values" and thus do not take duplicates into account. Some examples on StackOverflow for this problem with unique values:
- NumPy: Comparing Elements in Two Arrays
- Efficient way to compute intersecting values between two numpy arrays
I could imagine there would be a faster way to do this with numpy instead of a loop, so curious to see an answer!