3

I have a 2D Numpy array containing values from 0 to n. I want to get a list of length n, such that the i'th element of that list is an array of all the indices with value i+1 (0 is excluded).

For example, for the input

array([[1, 0, 1],
   [2, 2, 0]])

I'm expecting to get

[array([[0, 0], [0, 2]]), array([[1,0], [1,1]])]

I found this related question: Get a list of all indices of repeated elements in a numpy array which may be helpful, but I hoped to find a more direct solution that doesn't require flattening and sorting the array and that is as efficient as possible.

yatu
  • 86,083
  • 12
  • 84
  • 139
  • Your expected result is a list of arrays of varying size. There's no 'direct' way. Study the linked answers before you reject them. – hpaulj Feb 17 '19 at 16:14
  • @hpaulj: Exactly my point. Too much customization is asked in the question. And the desired output is already a list – Sheldore Feb 17 '19 at 16:15
  • 1
    Are you going to use the resulting list in a 'direct' and 'efficient' manner? – hpaulj Feb 17 '19 at 17:33
  • See also: [faster alternative to numpy.where?](https://stackoverflow.com/q/33281957/7851470) – Georgy May 25 '20 at 14:44

2 Answers2

4

Here's a vectorized approach, which works for arrays of an arbitrary amount of dimensions. The idea of this solution is to extend the functionality of the return_index method in np.unique, and return an array of arrays, each containing the N-dimensional indices of unique values in a numpy array.

For a more compact solution, I've defined the following function along with some explanations throughout the different steps:

def ndix_unique(x):
    """
    Returns an N-dimensional array of indices
    of the unique values in x
    ----------
    x: np.array
       Array with arbitrary dimensions
    Returns
    -------
    - 1D-array of sorted unique values
    - Array of arrays. Each array contains the indices where a
      given value in x is found
    """
    x_flat = x.ravel()
    ix_flat = np.argsort(x_flat)
    u, ix_u = np.unique(x_flat[ix_flat], return_index=True)
    ix_ndim = np.unravel_index(ix_flat, x.shape)
    ix_ndim = np.c_[ix_ndim] if x.ndim > 1 else ix_flat
    return u, np.split(ix_ndim, ix_u[1:])

Checking with the array from the question -

a = np.array([[1, 0, 1],[2, 2, 0]])

vals, ixs = ndix_unique(a)

print(vals)
array([0, 1, 2])

print(ixs)
[array([[0, 1],
        [1, 2]]), 
 array([[0, 0],
        [0, 2]]), 
 array([[1, 0],
        [1, 1]])]

Lets try with this other case:

a = np.array([[1,1,4],[2,2,1],[3,3,1]])

vals, ixs = ndix_unique(a)

print(vals)
array([1, 2, 3, 4])

print(ixs)
array([array([[0, 0],
              [0, 1],
              [1, 2],
              [2, 2]]),
       array([[1, 0],
              [1, 1]]), 
       array([[2, 0],
              [2, 1]]),
       array([[0, 2]])], dtype=object)

For a 1D array:

a = np.array([1,5,4,3,3])

vals, ixs = ndix_unique(a)

print(vals)
array([1, 3, 4, 5])

print(ixs)
array([array([0]), array([3, 4]), array([2]), array([1])], dtype=object)

Finally another example with a 3D ndarray:

a = np.array([[[1,1,2]],[[2,3,4]]])

vals, ixs = ndix_unique(a)

print(vals)
array([1, 2, 3, 4])

print(ixs)
array([array([[0, 0, 0],
              [0, 0, 1]]),
       array([[0, 0, 2],
              [1, 0, 0]]), 
       array([[1, 0, 1]]),
       array([[1, 0, 2]])], dtype=object)
yatu
  • 86,083
  • 12
  • 84
  • 139
  • Ah, yes @Bazingaa but notice that I need to work with both `y` and a flattened version of it. So the flattening is not to obtain the unique values – yatu Feb 17 '19 at 19:44
  • Sometimes `return_index` or `return_inverse` for `unique` is useful. – hpaulj Feb 17 '19 at 20:28
  • Hi @kontradictos. You're welcome! I'm simplifying the code and changing some things. The reason behind this is that it was adapted fro some other function, which contemplated that the vector of unique values (here `x`) could not be ordered. Here it is not the case. Updating in a few mins – yatu Feb 18 '19 at 10:46
  • @hpaulj thanks for your suggestion btw, helped simplifying a bit my code :-) – yatu Feb 18 '19 at 11:11
3

You can first get non-zero elements in your array and then use argwhere in a list comprehension to get separate array for each non-zero element. Here np.unique(arr[arr!=0]) will give you the nonzero elements over which you can iterate to get the indices.

arr = np.array([[1, 0, 1],
            [2, 2, 0]])

indices = [np.argwhere(arr==i) for i in np.unique(arr[arr!=0])]
# [array([[0, 0],
#         [0, 2]]), array([[1, 0],
#         [1, 1]])]
Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • In this way I can't know if the index belongs to value 1 or 2. I want to get two lists of indices, one for each nonzero value. – kontradictos Feb 17 '19 at 15:24
  • Yes here there is no correspondence to the differnt values in the array. It just returns the coordinates of those greater than 0 regardless of their value – yatu Feb 17 '19 at 15:25
  • @yatu: Check my edited answer. Thanks for commenting – Sheldore Feb 17 '19 at 15:28
  • @kontradictos: I modified my answer to your needs – Sheldore Feb 17 '19 at 15:29
  • Now yes, not easy to avoid a for loop here :-) – yatu Feb 17 '19 at 15:30
  • @yatu: there might be a direct solution here but since the OP wants a separate array for each non-zero element, I think this is the way to go – Sheldore Feb 17 '19 at 15:31
  • Did something similar for 1D vectors [see](https://stackoverflow.com/questions/54581381/indices-of-intersection-between-arrays/54581778#54581778), but dealing with more dimensions makes it much trickier. Nice one btw, +1 – yatu Feb 17 '19 at 15:33
  • It seems very inefficient because of the loop. I should have mentioned that I'm looking for a solution that is as efficient as possible, I'll add it to the question. – kontradictos Feb 17 '19 at 15:40
  • Did you try applying it to your real problem? How much time does it take right now? – Sheldore Feb 17 '19 at 15:48
  • @kontradictos: While simply using `np.argwhere` is sufficient to get all indices, you need additionally a list of them element wise. I am afraid that is too much of customization for an inbuilt numpy operation. Anyway, may be someone else will answer without using for loop – Sheldore Feb 17 '19 at 16:15