0

Could you please tell me, why the results differ, when quantiles are calculated in pandas (Python) and R?

Pandas code:

 print('p_new:   {:>5}   {:>5}     {:>5}'.format(
        round(self.pandas_data_frame['pending_new'].quantile(0.50), 2),
        round(self.pandas_data_frame['pending_new'].quantile(0.95), 2),
        round(self.pandas_data_frame['pending_new'].quantile(0.99), 2),
    ))

    print('new:     {:>5}   {:>5}   {:>5}'.format(
        round(self.pandas_data_frame['new'].quantile(0.50), 2),
        round(self.pandas_data_frame['new'].quantile(0.95), 2),
        round(self.pandas_data_frame['new'].quantile(0.99), 2),
    ))

results:

name     |   .50|    .95|    .99| 
p_new:     2.0    12.0      20.0
new:      52.0    78.0   106.06

R code:

dd = read.csv(“stats.csv”)
quantile(dd$pending_new, c(.50, .95, .99))
quantile(dd$new, c(.50, .95, .99))

results:

> quantile(dd$pending_new, c(.50, .95, .99))                                                                                                                                               
50%  95%  99% 
2.0 13.1 34.0 
> quantile(dd$new, c(.50, .95, .99))                                                                                                                                                       
50%    95%    99% 
52.00  81.00 129.26 
lmo
  • 37,904
  • 9
  • 56
  • 69
bhjkaser
  • 1
  • 1
  • 1
    Use the sources ([pandas](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.quantile.html) and [r](http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html)), Luke! – Nelewout Jun 09 '18 at 08:58
  • 3
    There are many different [ways](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample) to estimate quantiles. R uses one way by default, pandas uses another. – ayhan Jun 09 '18 at 09:22

1 Answers1

0

When doing this function in Python, all functions of the np.percentile() family have an optional argument interpolation. Set this argument to 'midpoint' and your results with match the result in R. You can also read more about the python function here: How to calculate 1st and 3rd quartiles?

Shikhar Parashar
  • 206
  • 2
  • 15