4

I have a matrix like this in NumPy:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

I'd like to get the most common value per row. In other words, I'd like to get a vector like this:

array([0, 1, 0, 1, 1, 2])

I managed to solve this problem using Scipy's mode method, in the following way:

scipy.stats.mode(data, axis=1)[0].flatten()

However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well

David Lasry
  • 829
  • 3
  • 12
  • 31

3 Answers3

1

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

Borja_042
  • 1,071
  • 1
  • 14
  • 26
1

If your labels are from 0 to n_labels - 1, you can use

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).

If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.

tbrugere
  • 755
  • 7
  • 17
0

I've adapted Def_Os answer from the following post:

Most efficient way to find mode in numpy array

The following function uses numpy only, and works with negatives.

import numpy as np
def mode_row(ar):
    _min = np.min(ar)
    adjusted = False
    if _min < 0:
        ar = ar - _min
        adjusted = True
    ans = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=ar)
    if adjusted:
        ans = ans + _min
    return ans

A = np.array([[0, 0, 1, 1],
              [1, 1, 0, 2],
              [0, 0, 1, 0],
              [0, 2, 1, 1],
              [1, 1, 1, 0],
              [1, 0, 2, 2]])

B = A - 1

mode_row(A)
mode_row(B)

array([0, 1, 0, 1, 1, 2], dtype=int64)

array([-1, 0, -1, 0, 0, 1], dtype=int64)

Self Dot
  • 352
  • 3
  • 14