Indices of unique values in n-dimensional array

Question

I have a 2D Numpy array containing values from 0 to n. I want to get a list of length n, such that the i'th element of that list is an array of all the indices with value i+1 (0 is excluded).

For example, for the input

array([[1, 0, 1],
   [2, 2, 0]])

I'm expecting to get

[array([[0, 0], [0, 2]]), array([[1,0], [1,1]])]

I found this related question: Get a list of all indices of repeated elements in a numpy array which may be helpful, but I hoped to find a more direct solution that doesn't require flattening and sorting the array and that is as efficient as possible.

Your expected result is a list of arrays of varying size. There's no 'direct' way. Study the linked answers before you reject them. — hpaulj, Feb 17 '19 at 16:14
@hpaulj: Exactly my point. Too much customization is asked in the question. And the desired output is already a list — Sheldore, Feb 17 '19 at 16:15
Are you going to use the resulting list in a 'direct' and 'efficient' manner? — hpaulj, Feb 17 '19 at 17:33
See also: [faster alternative to numpy.where?](https://stackoverflow.com/q/33281957/7851470) — Georgy, May 25 '20 at 14:44

yatu · Accepted Answer · 2020-04-13T20:12:02.203

Here's a vectorized approach, which works for arrays of an arbitrary amount of dimensions. The idea of this solution is to extend the functionality of the return_index method in np.unique, and return an array of arrays, each containing the N-dimensional indices of unique values in a numpy array.

For a more compact solution, I've defined the following function along with some explanations throughout the different steps:

def ndix_unique(x):
    """
    Returns an N-dimensional array of indices
    of the unique values in x
    ----------
    x: np.array
       Array with arbitrary dimensions
    Returns
    -------
    - 1D-array of sorted unique values
    - Array of arrays. Each array contains the indices where a
      given value in x is found
    """
    x_flat = x.ravel()
    ix_flat = np.argsort(x_flat)
    u, ix_u = np.unique(x_flat[ix_flat], return_index=True)
    ix_ndim = np.unravel_index(ix_flat, x.shape)
    ix_ndim = np.c_[ix_ndim] if x.ndim > 1 else ix_flat
    return u, np.split(ix_ndim, ix_u[1:])

Checking with the array from the question -

a = np.array([[1, 0, 1],[2, 2, 0]])

vals, ixs = ndix_unique(a)

print(vals)
array([0, 1, 2])

print(ixs)
[array([[0, 1],
        [1, 2]]), 
 array([[0, 0],
        [0, 2]]), 
 array([[1, 0],
        [1, 1]])]

Lets try with this other case:

a = np.array([[1,1,4],[2,2,1],[3,3,1]])

vals, ixs = ndix_unique(a)

print(vals)
array([1, 2, 3, 4])

print(ixs)
array([array([[0, 0],
              [0, 1],
              [1, 2],
              [2, 2]]),
       array([[1, 0],
              [1, 1]]), 
       array([[2, 0],
              [2, 1]]),
       array([[0, 2]])], dtype=object)

For a 1D array:

a = np.array([1,5,4,3,3])

vals, ixs = ndix_unique(a)

print(vals)
array([1, 3, 4, 5])

print(ixs)
array([array([0]), array([3, 4]), array([2]), array([1])], dtype=object)

Finally another example with a 3D ndarray:

a = np.array([[[1,1,2]],[[2,3,4]]])

vals, ixs = ndix_unique(a)

print(vals)
array([1, 2, 3, 4])

print(ixs)
array([array([[0, 0, 0],
              [0, 0, 1]]),
       array([[0, 0, 2],
              [1, 0, 0]]), 
       array([[1, 0, 1]]),
       array([[1, 0, 2]])], dtype=object)

Ah, yes @Bazingaa but notice that I need to work with both `y` and a flattened version of it. So the flattening is not to obtain the unique values — yatu, Feb 17 '19 at 19:44
Sometimes `return_index` or `return_inverse` for `unique` is useful. — hpaulj, Feb 17 '19 at 20:28
Hi @kontradictos. You're welcome! I'm simplifying the code and changing some things. The reason behind this is that it was adapted fro some other function, which contemplated that the vector of unique values (here `x`) could not be ordered. Here it is not the case. Updating in a few mins — yatu, Feb 18 '19 at 10:46
@hpaulj thanks for your suggestion btw, helped simplifying a bit my code :-) — yatu, Feb 18 '19 at 11:11

Sheldore · Answer 2 · 2019-02-17T15:33:27.860

3

You can first get non-zero elements in your array and then use argwhere in a list comprehension to get separate array for each non-zero element. Here np.unique(arr[arr!=0]) will give you the nonzero elements over which you can iterate to get the indices.

arr = np.array([[1, 0, 1],
            [2, 2, 0]])

indices = [np.argwhere(arr==i) for i in np.unique(arr[arr!=0])]
# [array([[0, 0],
#         [0, 2]]), array([[1, 0],
#         [1, 1]])]

edited Feb 17 '19 at 15:33

answered Feb 17 '19 at 15:17

Sheldore

37,862
7
57
71

In this way I can't know if the index belongs to value 1 or 2. I want to get two lists of indices, one for each nonzero value. – kontradictos Feb 17 '19 at 15:24
Yes here there is no correspondence to the differnt values in the array. It just returns the coordinates of those greater than 0 regardless of their value – yatu Feb 17 '19 at 15:25
@yatu: Check my edited answer. Thanks for commenting – Sheldore Feb 17 '19 at 15:28
@kontradictos: I modified my answer to your needs – Sheldore Feb 17 '19 at 15:29
Now yes, not easy to avoid a for loop here :-) – yatu Feb 17 '19 at 15:30
@yatu: there might be a direct solution here but since the OP wants a separate array for each non-zero element, I think this is the way to go – Sheldore Feb 17 '19 at 15:31
Did something similar for 1D vectors [see](https://stackoverflow.com/questions/54581381/indices-of-intersection-between-arrays/54581778#54581778), but dealing with more dimensions makes it much trickier. Nice one btw, +1 – yatu Feb 17 '19 at 15:33
It seems very inefficient because of the loop. I should have mentioned that I'm looking for a solution that is as efficient as possible, I'll add it to the question. – kontradictos Feb 17 '19 at 15:40
Did you try applying it to your real problem? How much time does it take right now? – Sheldore Feb 17 '19 at 15:48
@kontradictos: While simply using `np.argwhere` is sufficient to get all indices, you need additionally a list of them element wise. I am afraid that is too much of customization for an inbuilt numpy operation. Anyway, may be someone else will answer without using for loop – Sheldore Feb 17 '19 at 16:15

Indices of unique values in n-dimensional array

2 Answers2

Linked

Related