1

I want to calculate the mode of columns in a numpy array, excluding a specific value (0) from the calculation.

Example numpy array:

n=np.array([[0,2,1], [0,1,3], [1,2,3]])
>array([[0, 2, 1],
   [0, 1, 3],
   [1, 2, 3]])

create mask for where values dont equal 0

m_mask = n != 0    
>array([[False,  True,  True],
   [False,  True,  True],
   [ True,  True,  True]])

Apply the mask and calculate mean on axis 0:

from scipy.stats import mode

new_m = np.ma.array(n, mask = m_mask)
m=mode(new_m, axis=0)
m[0]  #access the values not the count
>array([[0, 2, 3]])

Seems like scipy.stats.mean may be ignoring the masked array?

Any ideas on how I can accomplish this?

proximacentauri
  • 1,749
  • 5
  • 25
  • 53

1 Answers1

3

I think np.ma.array(...) doesn't fit here. You can replace the line, where you assign new_m:

new_m = np.where(m_mask, n, np.nan)

(scipy.mode(...) will ignore nan-s). Output:

[[1. 2. 3.]]
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34