Pandas groupby where the column value is greater than the group's x percentile

Question

I have a pandas dataframe as follows:

df = pd.DataFrame()
df['Name'] = ['Abby', 'Abby', 'Abby', 'Abby', 'Abby', 'Daniel', 'Daniel', 'Daniel', 'Daniel', 'Daniel']
df['Marks'] = [100, 90, 76, 50, 10, 50, 45, 38, 25, 5]

I want to:

Find the 40th percentile for each group
Filter the dataframe such that all the values above the 40th percentile for that group are shown.

So, I have found the 40th percentile for each group using:

df.groupby('Name').quantile(0.4)

The Aim is to get to:

My main issue is that the values for each group are not standardized and so I cannot apply an overall percentile value for the entire dataset.

But all the help I saw regarding filtering a dataframe with a certain value does not do it separately for each group. I have seen the following questions:

Pandas, groupby where column value is greater than x

Pandas Groupby apply function to count values greater than zero

My question essentially builds on a variation of the following question: Calculate Arbitrary Percentile on Pandas GroupBy

Is there a way to do this in Pandas?

score 4 · Accepted Answer · answered Sep 19 '18 at 20:32

4

You can using transform

df[df.Marks>df.groupby('Name').Marks.transform('quantile',0.4)]
Out[712]: 
     Name  Marks
0    Abby    100
1    Abby     90
2    Abby     76
5  Daniel     50
6  Daniel     45
7  Daniel     38

answered Sep 19 '18 at 20:32

BENY

317,841
20
164
234

is it possible to use this approach for the expanding quantile instead of the full-sample quantile? – Yuca Sep 19 '18 at 20:36
1

I came up with same solution, only 4 mins late:) – Vaishali Sep 19 '18 at 20:37
@Yuca what you mean the expending quantile ? I think we can using lambda here – BENY Sep 19 '18 at 20:57
i'm sure you know about rolling vs expanding in pandas. The expanding quantile would be the quantile available at each rows. I was just curious because I do that with numba and assign – Yuca Sep 19 '18 at 20:59
@Yuca `df.groupby('Name').Marks.expanding().quantile(0.5) ` – BENY Sep 19 '18 at 21:07

rafaelc · Answer 2 · 2018-09-19T20:38:35.117

1

Using your code for percentiles, and loc, ge for >= (or gt, for >) and index matching:

df = df.set_index('Name')
df.loc[df.Marks.ge(df.groupby('Name').quantile(0.4).Marks)]

    Name    Marks
0   Abby    100
1   Abby    90
2   Abby    76
5   Daniel  50
6   Daniel  45
7   Daniel  38

edited Sep 19 '18 at 20:38

answered Sep 19 '18 at 20:32

rafaelc

57,686
15
58
82

Pandas groupby where the column value is greater than the group's x percentile

2 Answers2