Calculating Percentile in Python Pandas Dataframe

Question

I'm trying to calculate the percentile of each number within a dataframe and add it to a new column called 'percentile'.

This is my attempt:

import pandas as pd
from scipy import stats

data = {'symbol':'FB','date':['2012-05-18','2012-05-21','2012-05-22','2012-05-23'],'close':[38.23,34.03,31.00,32.00]}

df = pd.DataFrame(data)

close = df['close']

for i in df:
    df['percentile'] = stats.percentileofscore(close,df['close'])

The column is not being filled and results in 'NaN'. This should be fairly easy, but I'm not sure where I'm going wrong.

Thanks in advance for the help.

no need for looping through `for i in df`. see this answer https://stackoverflow.com/a/44607827/1870832 — Max Power, Jun 18 '17 at 03:06
You should know broadcast in Pandas. see this [broadcast](https://stackoverflow.com/a/29955358/5496463). — danche, Jun 18 '17 at 03:16

score 8 · Accepted Answer · answered Jun 18 '17 at 04:20

8

df.close.apply(lambda x: stats.percentileofscore(df.close.sort_values(),x))

or

df.close.rank(pct=True)

Output:

0    1.00
1    0.75
2    0.25
3    0.50
Name: close, dtype: float64

answered Jun 18 '17 at 04:20

Scott Boston

147,308
15
139
187

very simple answer, thanks @scott-boston – mattblack Jun 18 '17 at 04:55
1

Use `.rank` -- should be significantly faster – Brad Solomon Jun 18 '17 at 16:45
1

`.rank` is 100% what you should use. That lambda function while correct will be MUCH slower – Mate Hegedus Jan 01 '21 at 20:43

Calculating Percentile in Python Pandas Dataframe

1 Answers1