1

I found this snippet of code online and am having difficulty in understanding what each part of it is doing as I'm not proficient in Python.

The following routine takes an array as input and returns a dictionary that maps each unique value to its indices

def partition(array):
  return {i: (array == i).nonzero()[0] for i in np.unique(array)}
Georgy
  • 12,464
  • 7
  • 65
  • 73
SomebodyOnEarth
  • 408
  • 7
  • 16
  • 1
    Was the description not descriptive enough? It finds the indices of each unique value and puts it inside a dictionary. That said, this is badly written code. – cs95 Mar 17 '18 at 22:37
  • If there is a better way to do this, can you post it below? I don't understand what this part is doing (array == i).nonzero()[0] – SomebodyOnEarth Mar 17 '18 at 22:40
  • Is it actually the dictionary comprehension part you're having trouble understanding, rather than the numpy expression for each value that Coldspeed explained to you? – abarnert Mar 17 '18 at 22:46
  • When I say "badly written", I mean from the viewpoint of readability. Performance wise, not sure it can get much better than this (you could probably cythonize a loop yourself, but really, would you bother?). – cs95 Mar 17 '18 at 22:50
  • It is suboptimal because it needs to loop in the entire array as many times as there are unique values in the array. I suggest to loop in the array once and create the dictionary incrementally if there is nothing better. – Jean Paul May 30 '19 at 13:00
  • 1
    Normally, when one wants an explanation of the code, it's preferable to split it to separate parts. In this specific case it's unclear if there is a problem with understanding dict comprehension, boolean masks, or some other NumPy functionality. I'm voting to close this question as *needs more focus*. See [How to handle “Explain how this ${code dump} works” questions](https://meta.stackoverflow.com/questions/253894/how-to-handle-explain-how-this-code-dump-works-questions) for details. – Georgy May 25 '20 at 14:33
  • 1
    For those who come here just to get the solution for getting the indices, see the [faster alternative to numpy.where?](https://stackoverflow.com/q/33281957/7851470) post for a more general and, most probably, more efficient solution. – Georgy May 25 '20 at 14:36

3 Answers3

4

Trace each part out, this should speak for itself. Comments inlined.

In [304]: array = np.array([1, 1, 2, 3, 2, 1, 2, 3])

In [305]: np.unique(array)            # unique values in `array`
Out[305]: array([1, 2, 3])

In [306]: array == 1                  # retrieve a boolean mask where elements are equal to 1
Out[306]: array([ True,  True, False, False, False,  True, False, False])

In [307]: (array == 1).nonzero()[0]   # get the `True` indices for the operation above
Out[307]: array([0, 1, 5])

In summary; the code is creating a mapping of <unique_value : all indices of unique_value in array> -

In [308]: {i: (array == i).nonzero()[0] for i in np.unique(array)}
Out[308]: {1: array([0, 1, 5]), 2: array([2, 4, 6]), 3: array([3, 7])}

And here's the slightly more readable version -

In [313]: mapping = {}
     ...: for i in np.unique(array):
     ...:     mapping[i] = np.where(array == i)[0] 
     ...:     

In [314]: mapping
Out[314]: {1: array([0, 1, 5]), 2: array([2, 4, 6]), 3: array([3, 7])}
Community
  • 1
  • 1
cs95
  • 379,657
  • 97
  • 704
  • 746
2
  • array == i Return a boolean array of True whenever the value is equal to i and False otherwise.
  • nonzero() Return the indices of the elements that are non-zero(not False). https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.nonzero.html
  • nonzero()[0] Return the first index where array[index] = i.
  • for i in np.unique(array) Iterate over all the unique values of array or in other words do the logic foreach value of unique value of the array.
Eric
  • 95,302
  • 53
  • 242
  • 374
Amr Keleg
  • 336
  • 4
  • 11
2

consider also the following Pandas solution:

import pandas as pd

In [165]: s = pd.Series(array)

In [166]: d = s.groupby(s).groups

In [167]: d
Out[167]:
{1: Int64Index([0, 1, 5], dtype='int64'),
 2: Int64Index([2, 4, 6], dtype='int64'),
 3: Int64Index([3, 7], dtype='int64')}

PS pandas.Int64Index - supports all methods and indexing like a regular 1D numpy array

it can be easily converted to Numpy array:

In [168]: {k:v.values for k,v in s.groupby(s).groups.items()}
Out[168]:
{1: array([0, 1, 5], dtype=int64),
 2: array([2, 4, 6], dtype=int64),
 3: array([3, 7], dtype=int64)}
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419