Row or column wise most frequent elements in 2-D numpy array

Question

I'm trying to find the most frequent elements in an two dimensional numpy array. I want them row-wise or column-wise. I searched docs and web but I couldn't find exactly what I'm looking for. Let me explain with an example; assume I have an arr as following:

import numpy as np
arr = np.random.randint(0, 2, size=(5, 2))
arr

# Output
array([[1, 1],
       [0, 0],
       [0, 1],
       [1, 1],
       [1, 0]])

The expected output is an array that contains the most frequent elements in columns or rows depending on the given axis input. I know that np.unique() returns count of each unique value in the input array for given axis. So, it counts unique rows or columns in 2-D array:

np.unique(arr, return_counts=True, axis=0)

# Output
(array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1]]), array([1, 1, 1, 2]))

So, it tells that the unique elements [0, 0], [0, 1] and [1, 0] occur once whereas [1, 1] occurs twice in the arr. This does not work for me. Because I need to see the most frequent elements in rows (or columns). So my expected output is as follows:

array([[1, 1],    # --> 1
       [0, 0],    # --> 0
       [0, 1],    # --> 0 or 1 since they have same frequency
       [1, 1],    # --> 1
       [1, 0]])   # --> 0 or 1 since they have same frequency

Consequently, the result can be array([1, 0, 0, 1, 0]) or array([1, 0, 1, 1, 1]) with shape (5, ).

PS:

I know that the solution can be found by iterating over columns or rows and finding most frequent elements using np.unique(), however I want to find the most efficient way of doing this. Because, generally numpy is used for vectorized calculations for huge sized arrays and in my case the input array arr has too much elements. The computation will be costly with a for loop.

EDIT:

To be more clear, I added a loop based solution. Since the arr can contain not only 0s and 1s but also varying elements , I decided to use a different randomized arr

arr = np.random.randint(1, 4, size=(10, 3)) * 10

# arr:
array([[30, 30, 30],
       [10, 20, 30],
       [30, 30, 30],
       [30, 10, 20],
       [20, 20, 10],
       [20, 30, 20],
       [20, 30, 10],
       [10, 30, 10],
       [20, 10, 10],
       [20, 30, 30]])

most_freq_elem_in_rows = []
for row in arr:
  elements, counts = np.unique(row, return_counts=True)
  most_freq_elem_in_rows.append(elements[np.argmax(counts)])

# most_freq_elem_in_rows:
# [30, 10, 30, 10, 20, 20, 10, 10, 10, 30]

most_freq_elem_in_cols = []
for col in arr.T:
  elements, counts = np.unique(col, return_counts=True)
  most_freq_elem_in_cols.append(elements[np.argmax(counts)])

# most_freq_elem_in_cols:
# [20, 30, 10]

Then, most_freq_elem_in_rows and most_freq_elem_in_cols can be converted numpy arrays simply using np.array()

Because its not entitrely clear to me. So, a loop based one could clarify things. — Divakar, May 10 '20 at 13:44
I asked as clear as possible, even I add an example. Anyway, I will edit the question. — Ersel Er, May 10 '20 at 13:47
So, with the edited code it seems you are looking to get `mode` along rows and cols. So, you can just use this Q&A - https://stackoverflow.com/questions/16330831/ — Divakar, May 10 '20 at 18:35

score 5 · Answer 1 · answered May 10 '20 at 13:19

If you can add scipy dependency, then scipy.stats.mode achieves that:

import numpy as np
from scipy.stats import mode

arr = np.random.randint(0, 2, size=(5, 2))

mode(arr, 0)
ModeResult(mode=array([[0, 0]]), count=array([[3, 3]]))

mode(arr,1)
ModeResult(mode=array([[0],
                       [1], 
                       [0],
                       [0],
                       [0]]), 
           count=array([[1],
                        [2],
                        [2],
                        [2],
                        [1]]))

Row or column wise most frequent elements in 2-D numpy array

1 Answers1