The main difference of the signatures between numpy.percentile
and pandas.quantile: with pandas
the q
paramter should be given in a scala between [0-1] instead with numpy
between [0-100].
Both of them, by default, use a linear
interpolation technique to find such quantities. Instead, DataFrame.describe
has a less flexible signature and allow to use only the linear
one.
In numpy
>= 1.22 the parameter interpolation
is deprecated and replaced with method
.
Here an example of usage with linear interpolation: (default behavior)
import pandas as pd
import numpy as np
s =[18,45,66,70,76,83,88,90,90,95,95,98, 100]
print(pd.DataFrame(s).quantile(q=[.25, .50, .75]))
print(np.percentile(s, q=[25, 50, 75]))
print(pd.DataFrame(s).describe(percentiles=[.25, .5, .75])) # the parameter is redundant, it's the default behavior
Here using the midpoint interpolation:
s_even = [18,45,66,70,76,83,88,90,90,95,95,98]
print(pd.DataFrame(s_even).quantile(q=[.25, .5, .75], interpolation='midpoint'))
print(np.percentile(s_even, q=[25, 50, 75], interpolation='midpoint')) # verion < 1.22
print(np.percentile(s_even, q=[25, 50, 75], method='midpoint')) # version >= 1.22
s_odd = s_even + [100] # made it odd
print(pd.DataFrame(s_odd).quantile(q=[.25, .50, .75], interpolation='midpoint'))
print(np.percentile(s_odd, q=[25, 50, 75], interpolation='midpoint')) # verion < 1.22
print(np.percentile(s_odd, q=[25, 50, 75], method='midpoint')) # version >= 1.22