Declarative way to return all indices of matching elements for each element in numpy?

Question

Say I have an array:

arr = [1, 1, 2, 2, 1]

I want to get an array indices where each element indices[i] is the list of indices having values equal to the value at arr[i]. i.e., indices should be

[[0, 1, 4], [0, 1, 4], [2, 3], [2, 3], [0, 1, 4]]

It'd be easy to do this using loops or iterators, but I want a declarative numpy solution. I feel like there must be some clear/fast/concise way to get the output I'm looking for.

Does this answer your question? https://stackoverflow.com/a/30003565/14923227 It includes a vectorized solution. — Kevin, Mar 30 '21 at 19:25
@Kevin : I saw this one while searching. It's closely related, but I thought that there would probably be a way to get output in the exact form that I'm looking for in at most a couple of lines. (I like ELinda's solution, but wanted to leave the question open for a bit in case anyone had a way to do it without using lambdas.) — user904890, Mar 30 '21 at 19:34
I have a solution without lambdas. Gimme a couple of minutes — Mad Physicist, Mar 30 '21 at 19:47
A list of lists that differ in length is not a valid numpy array. Don't expect a simple numpy solution. — hpaulj, Mar 30 '21 at 19:57
@user904890. I've updated my answer. There was an error in the `axis` argument. — Mad Physicist, Mar 30 '21 at 21:35
@MadPhysicist, and like most magic, there's a niden slight-of-hand. `split` iterates through a bunch of slices. — hpaulj, Mar 30 '21 at 23:23
@hpaulj. Of course. It's just a lot faster to do that once than to call `where` or `nonzero` over and over. The goal is to minimize the python level iteration since you can't get rid of it entirely. — Mad Physicist, Mar 31 '21 at 00:23

score 0 · Answer 1 · answered Mar 30 '21 at 19:08

import numpy as np

arr = np.array([1, 1, 2, 2, 1])
f = lambda i: np.where(arr == i)
result = np.array(list(map(f, arr)))

result is:

array([[array([0, 1, 4])],
       [array([0, 1, 4])],
       [array([2, 3])],
       [array([2, 3])],
       [array([0, 1, 4])]], dtype=object)

Mad Physicist · Accepted Answer · 2021-03-30T21:32:24.850

Your output is going to be a list (or worse, an array of objects), since your output is ragged in the general case. If you are OK with having each index corresponding to a repeated element point to the same underlying array, you can so something like the following.

The gist is to take a hint from np.unique, which is a sort-based operation. Instead of using np.unique, we can use np.argsort combined with some fancy index math to get the job done.

First, you get an array with [0, 1, 4], [2, 3] as the elements. This will be a 1D array of objects. Actually, if the split list is non-ragged, elements will recombine into a 2D array, but it doesn't matter because it will not affect the intended interface.

idx = np.argsort(arr)
split_points = np.flatnonzero(np.diff(arr[idx])) + 1
elements = np.array(np.split(idx, split_points))

Next you can index elements to produce the full array of objects.

inverse_index = np.argsort(idx)
counts = np.diff(np.r_[0, split_points, len(arr)])
result = np.repeat(elements, counts, axis=0)[inverse_index]

result will be a 2D array if you have equal numbers of each unique element. You can choose to turn it into a list if you want.

Notice that the last part works because np.argsort is its own inverse: the index that puts a sorted array into its original unsorted order is the argsort of the argsort. So we've implemented most of the features of np.unique (inverse_index, counts) with intermediate results to make your specific application possible. To complete np.unique, the forward index is np.r_[0, split_points], and the actual unique values are given by arr[np.r_[0, split_points]].

You can shorten the code from 6 lines to about 3 without recomputing any of the necessary arrays more than once. At the same time, you can say goodbye to any semblance of legibility that was there before:

idx = np.argsort(arr)
split_points = np.flatnonzero(np.diff(arr[idx])) + 1
result = np.repeat(np.array(np.split(idx, split_points)), np.diff(np.r_[0, split_points, len(arr)]), axis=0)[np.argsort(idx)]

Declarative way to return all indices of matching elements for each element in numpy?

2 Answers2