Find indices of each bin using numpy

Question

I'm encountering a problem that I hope you can help me solve.

I have a 2D numpy array which I want to divide into bins by value. Then I need to know the exact initial indices of all the numbers in each bin.

For example, consider the matrix

    [[1,2,3], [4,5,6], [7,8,9]]

and the bin array

    [0,2,4,6,8,10].

Then the element first element ([0,0]) should be stored in one bin, the next two elements ([0,1],[0,2]) should be stored in another bin and so on. The desired output looks like this:

    [[[0,0]],[[0,1],[0,2]],[[1,0],[1,1]],[[1,2],[2,0]],[[2,1],[2,2]]]

Even though I tried several numpy functions, I'm not able to do this in an elegant way. The best attempt might be

    >>> a = [[1,2,3], [4,5,6], [7,8,9]]
    >>> bins = [0,2,4,6,8,10]
    >>> bin_in_mat = np.digitize(a, bins, right=False)
    >>> bin_in_mat
    array([[1, 2, 2],
           [3, 3, 4],
           [4, 5, 5]])
    >>> indices = np.argwhere(bin_in_mat)
    >>> indices
    array([[0, 0],
           [0, 1],
           [0, 2],
           [1, 0],
           [1, 1],
           [1, 2],
           [2, 0],
           [2, 1],
           [2, 2]])

but this doesn't solve my problem. Any suggestions?

Post a more random sample case? That looks too simplistic to get any idea about the desired output for a generic case. — Divakar, Jul 12 '18 at 07:24
Have a look here https://stackoverflow.com/q/26783719/3753826 — divenex, Aug 30 '22 at 10:42

score 2 · Accepted Answer · answered Jul 12 '18 at 08:01

You need to leave numpy and use a loop for this - it's not capable of representing your result:

bin_in_mat = np.digitize(a, bins, right=False)
bin_contents = [np.argwhere(bin_in_mat == i) for i in range(len(bins))]

>>> for b in bin_contents:
...     print(repr(b))

array([], shape=(0, 2), dtype=int64)
array([[0, 0]], dtype=int64)
array([[0, 1],
       [0, 2]], dtype=int64)
array([[1, 0],
       [1, 1]], dtype=int64)
array([[1, 2],
       [2, 0]], dtype=int64)
array([[2, 1],
       [2, 2]], dtype=int64)

Note that digitize is a bad choice for large integer input (until 1.15), and is faster and more correct as bin_in_mat = np.searchsorted(bins, a, side='left')

Find indices of each bin using numpy

1 Answers1