Numpy - find most common item per row

Question

I have a matrix like this in NumPy:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

I'd like to get the most common value per row. In other words, I'd like to get a vector like this:

array([0, 1, 0, 1, 1, 2])

I managed to solve this problem using Scipy's mode method, in the following way:

scipy.stats.mode(data, axis=1)[0].flatten()

However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well

score 1 · Accepted Answer · answered Nov 25 '20 at 13:26

1

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

answered Nov 25 '20 at 13:26

Borja_042

1,071
1
14
26

Thanks, that worked :) Is there a vectorized way, though? – David Lasry Nov 25 '20 at 14:27
However it doesn't support negative numbers – dspr Nov 25 '20 at 14:28

score 1 · Answer 2 · answered Apr 10 '22 at 00:03

If your labels are from 0 to n_labels - 1, you can use

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).

If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.

score 0 · Answer 3 · answered Feb 26 '22 at 13:34

I've adapted Def_Os answer from the following post:

Most efficient way to find mode in numpy array

The following function uses numpy only, and works with negatives.

import numpy as np
def mode_row(ar):
    _min = np.min(ar)
    adjusted = False
    if _min < 0:
        ar = ar - _min
        adjusted = True
    ans = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=ar)
    if adjusted:
        ans = ans + _min
    return ans

A = np.array([[0, 0, 1, 1],
              [1, 1, 0, 2],
              [0, 0, 1, 0],
              [0, 2, 1, 1],
              [1, 1, 1, 0],
              [1, 0, 2, 2]])

B = A - 1

mode_row(A)
mode_row(B)

array([0, 1, 0, 1, 1, 2], dtype=int64)

array([-1, 0, -1, 0, 0, 1], dtype=int64)

Numpy - find most common item per row

3 Answers3

Linked