1

Suppose I have a numpy array of shape (1,4,5),

arr = np.array([[[ 0,  0,  0,  3,  0],
                [ 0,  0,  2,  3,  2],
                [ 0,  0,  0,  0,  0],
                [ 2,  1,  0,  0, 0]]])

And I would like to find the most frequent non-zero value in the array across a specific axis, and only returns zero if there are no other non-zero values.

Let's say I'm looking at axis=2, I would like to get something like [[3,2,0,2]] from this array (For the last row either 1 or 2 would be fine). Is there a good way to implement this?

I've tried the solution in this following question (Link) , but I am unsure how to modify it so that it excludes a specific value.Thanks again!

Ehsan
  • 12,072
  • 2
  • 20
  • 33
Axd
  • 13
  • 2
  • 1
    Does your array include negative values as well? Could you also provide us with your desired output of above example for the case of axis=0 and axis=1? – Ehsan Aug 04 '20 at 02:39
  • I am currently working on simple non-negative integer array. I'm not so sure how the output should look like for axis=0, but axis=1 should look like `[[2, 1, 2, 3, 2]]` I think. – Axd Aug 04 '20 at 05:11
  • Is it that your array is always 3-D non-negative and you are only interested in axis=1 and 2? This will make a difference in what the best solution would be. – Ehsan Aug 04 '20 at 08:18

2 Answers2

2

We can use numpy.apply_along_axis and a simple function to solve this. Here, we make use of numpy.bincount to count the occurrences of numeric values and then numpy.argmax to get the highest occurrence. If there are no other values than exclude, we return it.

Code:

def get_freq(array, exclude):
    count = np.bincount(array[array != exclude])
    if count.size == 0:
        return exclude
    else:  
        return np.argmax(count) 

np.apply_along_axis(lambda x: get_freq(x, 0), axis=2, arr=arr)

Output:

array([[3, 2, 0, 1]])

Please note, that it will also return exclude if you pass an empty array.

EDIT: As Ehsan noted, above solution will not work for negative values in the given array. For this case, use Counter from collections:

arr = np.array([[[ 0,  -3,  0,  3,  0],
                 [ 0,  0,  2,  3,  2],
                 [ 0,  0,  0,  0,  0],
                 [ 2,  -5,  0,  -5, 0]]])

from collections import Counter

def get_freq(array, exclude):
    count = Counter(array[array != exclude]).most_common(1)
    if not count:
        return exclude
    else:  
        return count[0][0]

Output:

array([[-3,  2,  0, -5]])

most_common(1) returns the most occurring value in the Counter object as one element list with a tuple in which first element is the value, and second is its number of occurrences. This is returned as a list, thus the double indexing. If list is empty, then most_common has not found any occurrences (either only exclude or empty).

errno98
  • 312
  • 2
  • 12
  • `np.bincount` is for non-negative arrays. This solution fails if array has negative values. – Ehsan Aug 04 '20 at 02:38
0

This is an alternate solution (maybe not as efficient as the above one, but a unique one) -

#Gets the positions for the highest frequency numbers in axis=2
count_max_pos = np.argmax(np.sum(np.eye(5)[arr][:,:,:,1:], axis=2), axis=2)[0]+1

#gets the max values in based on the indices
k = enumerate(count_max_pos)
result = [arr[0][i] for i in k]

print(result)
[3,2,0,1]
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51