Removing 1% top and bottom percentiles given a condition

Question

I have the following dataset and I would like to remove that 1% top and bottom percentiles for each "PRIMARY_SIC_CODE" on the column "ROA", i.e., take all the different ROAS for each PRIMARY_SIC_CODE, and remove the quantiles and the rest of the rows in the dataset.

Is there any easy way of doing it? Thanks!

score 1 · Accepted Answer · answered Dec 18 '22 at 00:16

1

If you want to exclude the top and bottom 1% by considering the column ROAS in its entirety:

top_1perc = df['ROA'].quantile(q=0.99)
bottom_1perc = df['ROA'].quantile(q=0.01)
new_df = df[(df['ROA']> bottom_1perc) & (df['ROA']< top_1perc)

If instead, you want to exclude them for each PRIMARY SIC CODE group:

df[df.groupby('PRIMARY SIC CODE')['ROA'].transform(\
   lambda x : ((x > x.quantile(q=0.01)) & (x<x.quantile(q=0.99)))).eq(1)]

answered Dec 18 '22 at 00:16

E_Lasker

36
2

Thanks. The second one is the one that I need, thanks a lot! Question: I tried one thing which is very similar but using .apply. What's the difference with .transform? Which one is better? Thanks! – R__ Dec 19 '22 at 17:57
[This post](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object) goes in-depth on the difference between transform() and apply(). – E_Lasker Dec 20 '22 at 19:50

score 0 · Answer 2 · answered Dec 17 '22 at 21:08

0

Try along the lines of...

df.groupby("PRIMARY SIC CODE")['ROA'].quantile(q=0.1)

answered Dec 17 '22 at 21:08

Raghu

41
7

Removing 1% top and bottom percentiles given a condition

2 Answers2