2

What is most efficient way to find the mode per row in a multi-dimensional array of the non-zero elements?

For example:

[
 [0.  0.4 0.6 0.  0.6 0.  0.6 0.  0.  0.6 0.  0.6 0.6 0.6 0.  0.  0.  0.6
     0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.6 0.  0.  0.6 0.6 0.6 0.  0.  0.6
     0.6 0.6 0.  0.5 0.6 0.6 0.  0.  0.6 0.  0.6 0.  0.  0.6],
 [0.  0.1 0.2 0.1 0.  0.1 0.1 0.1 0.  0.1 0.  0.  0.  0.1 0.1 0.  0.1 0.1
 0.  0.1 0.1 0.1 0.  0.1 0.1 0.1 0.  0.1 0.2 0.  0.1 0.1 0.  0.1 0.1 0.1
 0.  0.2 0.1 0.  0.1 0.  0.1 0.1 0.  0.1 0.  0.1 0.  0.1]
]

The mode of the above is [0, 0.1], but ideally we want to return [0.6, 0.1].

James
  • 5,942
  • 15
  • 48
  • 72
  • Possible duplicate of [Most efficient way to find mode in numpy array](https://stackoverflow.com/questions/16330831/most-efficient-way-to-find-mode-in-numpy-array) – yatu Feb 14 '19 at 20:45
  • 1
    While Nick's solution works, this would be done in a much simpler way if you were using pandas instead of numpy. – Griffin Feb 14 '19 at 21:29
  • If you're open to using pandas as what @Griffin suggested, I'd be more than happy to write an answer as well... Unless Griffin wants to do it first! – rayryeng Feb 14 '19 at 21:55

3 Answers3

0

You would use the same method as this question (mentioned in the comments by @yatu), but instead make a call to the numpy.nonzero() method.

To get just the non-zero elements, we can just call the nonzero method, which will return the indices of the non-zero elements. We can do this using this command, if a is a numpy array:

a[nonzero(a)]

Example finding the mode (building off code from the other answer):

import numpy as np
from scipy import stats

a = np.array([
    [1, 0, 4, 2, 2, 7],
    [5, 2, 0, 1, 4, 1],
    [3, 3, 2, 0, 1, 1]]
)

def nonzero_mode(arr):
    return stats.mode(arr[np.nonzero(arr)]).mode

m = map(nonzero_mode, a)
print(m)

If you wanted to get the mode of each row, just use a loop through the array:

for row in a:
   print(nonzero_mode(row))
Nick
  • 823
  • 2
  • 10
  • 22
0

From this answer by removing the zero element :

def mode(arr):
    """
    Function: mode, to find the mode of an array.
    ---
    Parameters:
    @param: arr, nd array, any.
    ---
    @return: the mode value (whatever int/float/etc) of this array.
    """
    vals,counts = np.unique(arr, return_counts=True)
    if 0 in vals:
        z_idx = np.where(vals == 0)
        vals   = np.delete(vals,   z_idx)
        counts = np.delete(counts, z_idx)
    index = np.argmax(counts)
    return vals[index]
Bilal
  • 3,191
  • 4
  • 21
  • 49
0

Inspired by this answer, you can use stats.mode with np.nan

import numpy as np
from scipy import stats

a = np.array([
    [1, 0, 4, 2, 2, 7],
    [5, 2, 0, 1, 4, 1],
    [3, 3, 2, 0, 1, 1]]
)
nonzero_a = np.where(a==0, np.nan, a)
mode, count = stats.mode(nonzero_a,axis=1, nan_policy='omit')

And you will get the result

mode:

masked_array(
  data=[[2.],
        [1.],
        [1.]],
  mask=False,
  fill_value=1e+20)

count:

masked_array(
  data=[[2.],
        [2.],
        [2.]],
  mask=False,
  fill_value=1e+20)

NOTE that if the values along the counting axis are all np.nan, the mode is undefined.

qun
  • 711
  • 8
  • 7