Looking at the np.percentile
code it is clear it does nothing special with masked arrays.
def percentile(a, q, axis=None, out=None,
overwrite_input=False, interpolation='linear', keepdims=False):
q = array(q, dtype=np.float64, copy=True)
r, k = _ureduce(a, func=_percentile, q=q, axis=axis, out=out,
overwrite_input=overwrite_input,
interpolation=interpolation)
if keepdims:
if q.ndim == 0:
return r.reshape(k)
else:
return r.reshape([len(q)] + k)
else:
return r
Where _ureduce
and _percentile
are internal functions defined in numpy/lib/function_base.py
. So the real action is more complex.
Masked arrays have 2 strategies for using numpy functions. One is to fill
- replace the masked values with innocuous ones, for example 0 when doing sum, 1 when doing a product. The other is to compress
the data - that is, remove all masked values.
for example:
In [997]: data=np.arange(-5,10)
In [998]: mdata=np.ma.masked_where(data<0,data)
In [1001]: np.ma.filled(mdata,0)
Out[1001]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [1002]: np.ma.filled(mdata,1)
Out[1002]: array([1, 1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [1008]: mdata.compressed()
Out[1008]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Which is going to give you the desired percentile
? Filling or compressing? Or none. You need to understand the concept of percentile well enough to know how it should apply in the case of your masked values.