1

How can I get an element-wise count of each element's number of occurrences in a numpy array, along a given axis? By "element-wise," I mean each value of the array should be converted to the number of times it appears.

Simple 2D input:

[[1, 1, 1],
 [2, 2, 2],
 [3, 4, 5]]

Should output:

[[3, 3, 3],
 [3, 3, 3],
 [1, 1, 1]]

The solution also needs to work relative to a given axis. For example, if my input array a has shape (4, 2, 3, 3), which I think of as "a 4x2 matrix of 3x3 matrices," running solution(a) should spit out a (4, 2, 3, 3) solution of the form above, where each 3x3 "submatrix" contains counts of the corresponding elements relative to that submatrix alone, rather than the entire numpy array at large.

More complex example: suppose I take the example input above a and call skimage.util.shape.view_as_windows(a, (2, 2)). This gives me array b of shape (2, 2, 2, 2):

[[[[1 1]
   [2 2]]

  [[1 1]
   [2 2]]]


 [[[2 2]
   [3 4]]

  [[2 2]
   [4 5]]]]

Then solution(b) should output:

[[[[2 2]
   [2 2]]

  [[2 2]
   [2 2]]]


 [[[2 2]
   [1 1]]

  [[2 2]
   [1 1]]]]

So even though the value 1 occurs 3 times in a and 4 times in b, it only occurs twice in each 2x2 window.

CaptainStiggz
  • 1,787
  • 6
  • 26
  • 50
  • Elaborate on - `element-wise count along axis of values in numpy array`? What exactly are you counting? – Divakar Nov 06 '17 at 05:16
  • @Divakar I want to count the number of occurrences of each element. I'll edit the question to make it more clear. Related to [the question](https://stackoverflow.com/questions/47109031/calculating-windowed-probabilities-in-numpy-scipy/47109217#47109217) you cleverly answered yesterday. – CaptainStiggz Nov 06 '17 at 05:23
  • 1
    @CurtF. Looping along the relevant axes and constructing a new array using regular python loops is fairly straightforward, but too slow. I looked at using `np.histogram` and `np.bincount` but neither seem well-suited for the task, as they require flattened arrays. – CaptainStiggz Nov 06 '17 at 05:28

1 Answers1

3

Starting off approach

We can use np.unique to get the counts of occurrences and also tag each element from 0 onwards, letting us index into those counts with the tags for the desired output, like so -

In [43]: a
Out[43]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 4, 5]])

In [44]: _,ids,c = np.unique(a, return_counts=True, return_inverse=True)

In [45]: c[ids].reshape(a.shape)
Out[45]: 
array([[3, 3, 3],
       [3, 3, 3],
       [1, 1, 1]])

For positive integers numbers in input array, we can also use np.bincount -

In [73]: c = np.bincount(a.ravel())

In [74]: c[a]
Out[74]: 
array([[3, 3, 3],
       [3, 3, 3],
       [1, 1, 1]])

For negative integers numbers, simply offset by the minimum in it.

Extending to generic n-dims

Let's use bincount for this -

In [107]: ar
Out[107]: 
array([[[1, 1, 1],
        [2, 2, 2],
        [3, 4, 5]],

       [[2, 3, 5],
        [4, 3, 4],
        [3, 1, 2]]])

In [104]: ar2D = ar.reshape(-1,ar.shape[-2]*ar.shape[-1])

# bincount2D_vectorized from https://stackoverflow.com/a/46256361/ @Divakar
In [105]: c = bincount2D_vectorized(ar2D)

In [106]: c[np.arange(ar2D.shape[0])[:,None], ar2D].reshape(ar.shape)
Out[106]: 
array([[[3, 3, 3],
        [3, 3, 3],
        [1, 1, 1]],

       [[2, 3, 1],
        [2, 3, 2],
        [3, 1, 2]]])
Gulzar
  • 23,452
  • 27
  • 113
  • 201
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Awesome!! I just edited my post to give an example for the generic n-dims solution. I'm gonna play around with your "generic n-dims" solution for a few minutes and see if I can massage it to match my example. – CaptainStiggz Nov 06 '17 at 05:47
  • 1
    @CaptainStiggz For performance, play around with other options as well to do binned counting at that post - https://stackoverflow.com/a/46256361/. – Divakar Nov 06 '17 at 05:54
  • This is brilliant! Can you make any reading recommendations for developing a better intuition for numpy basics? Some of the reshaping you're doing still feels like black magic to a numpy beginner like me. The docs are a bit sparse when it comes to more complex use cases. For example, [the docs](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.shape.html) on `np.shape` don't cover negative axes, like you use in `ar.reshape(-1,ar.shape[-2]*ar.shape[-1])` – CaptainStiggz Nov 06 '17 at 06:47
  • 1
    @CaptainStiggz Well `ar.shape` is the shape tuple. And, `ar.shape[-1]` gets us the last element of the tuple, i.e. length of the last axis of the array. `-2` is the second last element, hence the length of array along secon last axis. The idea being we need to get the combined length along the last two axes for the reshaping. Also, the `-1` in `ar.reshape(-1,..)` basically means compute the remaining length automatically, while keeping the reshaped array as 2D. For reference, I think the official docs are pretty good. – Divakar Nov 06 '17 at 07:49