3

I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it.

test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()

output:

enter image description here

I am interested in only 25%, 75% percentiles. I wonder which method does pandas use to calculate them?

Referring to https://en.wikipedia.org/wiki/Quartile the article, results are different as following:

enter image description here

So what statistical/mathematical method does pandas uses to calculate percentile?

Natig Aliyev
  • 379
  • 6
  • 18
  • [pd.Series.quantile](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html)? or [pd.DataFrame.quantile](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.quantile.html)? – Abdou Jan 19 '17 at 14:34
  • @Abdou No. I mean within .describe() method. And actually, <'pd.Series.quantile> – Natig Aliyev Jan 19 '17 at 14:38
  • Also related http://stackoverflow.com/q/38596100/2285236 – ayhan Jan 19 '17 at 14:48
  • @ayhan thanks for the comment, yes it is related a bit, unfortunately unanswered. – Natig Aliyev Jan 19 '17 at 14:55
  • @NatigAliyev, take a look at `quantile` with `from pandas.core.algorithms import quantile`. That `quantile` function has an `interpolation_method` parameter; if that helps at all. It may also be linked to [`numpy's percentile method`](https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html). – Abdou Jan 19 '17 at 14:59
  • @Abdou I really appreciate for your help. I used quantile and tried several examples and finally figured that out how python mathematically calculate quantile/percentile. – Natig Aliyev Jan 19 '17 at 15:45
  • 1
    @NatigAliyev, that sounds great. Would you mind sharing your solution here, so we can all benefit from your findings? – Abdou Jan 19 '17 at 15:47
  • @Abdou I posted my answer. Thanks – Natig Aliyev Jan 19 '17 at 17:26

2 Answers2

6

As I mentioned in the comments, I finally figured out how it works by trying from pandas.core.algorithms import quantile using quantile function as @Abdou suggested.

I am not that good to explain it only by typing, therefore I will do it only on the given example for 25% and 75% for this example only. Here is the brief (maybe poor) explanation:

For the example list [7, 15, 36, 39, 40, 41] quantiles are following way:

7 -> 0%

15 -> 20%

36 -> 40%

39 -> 60%

40 -> 80%

41 -> 100%

Since we want to find 25% percentile, it will be between 15 and 36, moreover, it is 20% + 5% = 15 + (36-15)/4 = 15 + 5.25 = 20.25.

(36-15)/4 is used, because the distance between 15 and 36 is 40% - 20% = 20%, so we divide it by 4 to get 5%.

The same way we can find 75%.

60% + 15% = 39 + 3*(40-39)/4 = 39.75

That's it. I am really sorry for poor explanation

NOTE: Thank you @shin for the correction mentioned in the comment.

Natig Aliyev
  • 379
  • 6
  • 18
1

It does a [series.quantile(x) for x in percentiles] where percentiles is percentiles = np.array([0.25, 0.5, 0.75]) if it s not provided.

You can see that in pandas/pandas/core/generic.py

So it is using : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html

Alex
  • 816
  • 5
  • 14
  • thanx for response. Actually I am asking for statistical method it is using? what statistical method it uses to calculate 25% and 75% percentiles – Natig Aliyev Jan 19 '17 at 14:47