4

I have a numpy masked nd-array. I need to find the median along a specific axis. For some cases, I end up having even number of elements, in which case numpy.ma.median gives average of the middle two elements. However, I don't want the average. I want one of the median elements. Any one of the two is fine. How do I get this?

MWE:

>>> import numpy
>>> data=numpy.arange(-5,10).reshape(3,5)
>>> mdata=numpy.ma.masked_where(data<=0,data)
>>> numpy.ma.median(mdata, axis=0)
masked_array(data=[5.0, 3.5, 4.5, 5.5, 6.5],
             mask=[False, False, False, False, False],
       fill_value=1e+20)

As you can see, it is averaging (1 and 6) and providing fractional values (3.5). I want any one of 1 or 6.

Nagabhushan S N
  • 6,407
  • 8
  • 44
  • 87
  • 1
    What you're asking for is not a median. The definition of "median" if a value where half the elements are greater, and half are less. With an even number of elements, if you pick one of the elements, then you don't have half above and below. – Tim Roberts Jul 18 '21 at 04:41
  • Okay, thanks! Do you know what it is called? In any case, I've made it clear what I need, right? Is there any ambiguity there, that I need to address? – Nagabhushan S N Jul 18 '21 at 04:43
  • 1
    For even elements, the median returns the average of two middle numbers. However, if you don't want the average, just want one any of the two middle numbers, you can drop an element from your collection while calling the median method which will make length of collection odd, and you will find a value available in the collection, not the average (though it is not a proper way to find median) – Pranta Palit Jul 18 '21 at 04:47
  • Right. The issue is, there IS no right answer. Consider the collection [1,6], If either 1 or 6 is a right answer, then how can your results ever be repeatable? As Pranta says, just drop one row and you'll get what you want. – Tim Roberts Jul 18 '21 at 04:48
  • Reproducibility can be achieved by always selecting the lower element – Nagabhushan S N Jul 18 '21 at 04:51

3 Answers3

2

For even number of elements, the median returns the average of two middle numbers. However, if you don't want the average, just want one any of the two middle numbers, you can drop an element from your collection while calling the median method which will make length of collection odd, and you will you'll get what you want, not the average (though it is not a proper way to find median)

Pranta Palit
  • 663
  • 2
  • 5
  • 15
  • Depending on overall context, one might consider adding an element, instead of dropping, because then you might more easily know which element was added, and can drop it again after, leaving the original data "unaffected". Also, which side to drop from/add to affects whether median picks next lesser or next greater. I'm considering these things in light of potentially matching Numpy's median to IDL's median - since I need to match output from IDL, and IDL's /EVEN option was not used. IDL picks the next greater, apparently. – GG2 May 12 '22 at 23:58
  • However, downside to adding an element is it might require figuring out the min/max value in the data. – GG2 May 12 '22 at 23:58
0

It is expected to average out when you have even number of elements. Suppose you have array of elements from 1 to 10. The the mean is expected to be average of 5 and 6 which is 5.5. If you have elements from 1 to 11 then median is 6. Hope this clarifies

dinesh kumar
  • 95
  • 1
  • 4
0
  • numpy.percentile(array, 50) gives median value.
  • numpy.percentile has an option to specify interpolation to nearest.
  • However this function is not available in numpy.ma module.
  • The trick used in this answer can be used here.

The idea is to fill invalid values with nan and use numpy.nanpercentile() with nearest interpolation.

>>> mdata1 = numpy.ma.filled(mdata.astype('float'), numpy.nan)
>>> numpy.nanpercentile(mdata1, 50, axis=0, interpolation='nearest')
array([5., 1., 2., 3., 4.])
Nagabhushan S N
  • 6,407
  • 8
  • 44
  • 87