1

I have the following data frame

   item_id group  price
0        1     A     10
1        3     A     30
2        4     A     40
3        6     A     60
4        2     B     20
5        5     B     50

I am looking to add a quantile column based on the price for each group like below:

item_id       group        price    quantile
 01            A            10        0.25
 03            A            30        0.5
 04            A            40        0.75
 06            A            60        1.0
 02            B            20        0.5
 05            B            50        1.0

I could loop over entire data frame and perform computation for each group. However, I am wondering is there a more elegant way to resolve this? Thanks!

anky
  • 74,114
  • 11
  • 41
  • 70
Edamame
  • 23,718
  • 73
  • 186
  • 320

2 Answers2

8

You need df.rank() with pct=True:

pct : bool, default False Whether or not to display the returned rankings in percentile form.

df['quantile']=df.groupby('group')['price'].rank(pct=True)
print(df)

   item_id group  price  quantile
0        1     A     10      0.25
1        3     A     30      0.50
2        4     A     40      0.75
3        6     A     60      1.00
4        2     B     20      0.50
5        5     B     50      1.00
anky
  • 74,114
  • 11
  • 41
  • 70
  • Thanks! But I got the following warning/error without the output results. SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Edamame Dec 21 '19 at 17:36
  • 1
    that is just because the `df` is copied from another df, with some operations , eg: `dropna()` , where you should use `.reset_index()` or `.copy()` , check this : https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – anky Dec 21 '19 at 17:38
1

Although the df.rank method above is probably the way to go for this problem. Here's another solution using pd.qcut with GroupBy:

df['quantile'] = (
    df.groupby('group')['price']
      .apply(lambda x: pd.qcut(x, q=len(x), labels=False)
             .add(1)
             .div(len(x))
            )
)
   item_id group  price  quantile
0        1     A     10      0.25
1        3     A     30      0.50
2        4     A     40      0.75
3        6     A     60      1.00
4        2     B     20      0.50
5        5     B     50      1.00
Erfan
  • 40,971
  • 8
  • 66
  • 78