EDIT 2022: The original answer below is depreciated from numpy v1.22.0 - the argument interpolation
is now depreciated and is renamed method
- the lower
, higher
and nearest
methods are retained for backwards compatibility but are now in method linear
. New methods have now been added, see the man page for details.
Essentially now you can write
np.percentile(x,p,method="method")
with method chosen from:
‘inverted_cdf’
‘averaged_inverted_cdf’
‘closest_observation’
‘interpolated_inverted_cdf’
‘hazen’
‘weibull’
‘linear’ (default)
‘median_unbiased’
‘normal_unbiased’
older answer < v1.22
If numpy is to be used, one can also use the built-in percentile function. From version 1.9.0 of numpy, percentile has the option "interpolation" that allows you to pick out the lower/higher/nearest percentile value. The following will work on unsorted arrays and finds the nearest percentile index:
import numpy as np
p=70 # my desired percentile, here 70%
x=np.random.uniform(10,size=(1000))-5.0 # dummy vector
# index of array entry nearest to percentile value
pcen=np.percentile(x,p,interpolation='nearest')
i_near=abs(x-pcen).argmin()
Most people will normally want the nearest percentile value as stated above. But just for completeness, you can also easily specify to get the entry below or above the stated percentile value:
# Use this to get index of array entry greater than percentile value:
pcen=np.percentile(x,p,interpolation='higher')
# Use this to get index of array entry smaller than percentile value:
pcen=np.percentile(x,p,interpolation='lower')
For OLD versions of numpy < v1.9.0, the interpolation option is not available, and thus the equivalent is this:
# Calculate 70th percentile:
pcen=np.percentile(x,p)
i_high=np.asarray([i-pcen if i-pcen>=0 else x.max()-pcen for i in x]).argmin()
i_low=np.asarray([i-pcen if i-pcen<=0 else x.min()-pcen for i in x]).argmax()
i_near=abs(x-pcen).argmin()
In summary:
i_high points to the array entry which is the next value equal to, or greater than, the requested percentile.
i_low points to the array entry which is the next value equal to, or smaller than, the requested percentile.
i_near points to the array entry that is closest to the percentile, and can be larger or smaller.
My results are:
pcen
2.3436832738049946
x[i_high]
2.3523077864975441
x[i_low]
2.339987054079617
x[i_near]
2.339987054079617
i_high,i_low,i_near
(876, 368, 368)
i.e. location 876 is the closest value exceeding pcen, but location 368 is even closer, but slightly smaller than the percentile value.