0

I have the following dataset and I would like to remove that 1% top and bottom percentiles for each "PRIMARY_SIC_CODE" on the column "ROA", i.e., take all the different ROAS for each PRIMARY_SIC_CODE, and remove the quantiles and the rest of the rows in the dataset.

Is there any easy way of doing it? Thanks!

enter image description here

R__
  • 117
  • 1
  • 7

2 Answers2

1

If you want to exclude the top and bottom 1% by considering the column ROAS in its entirety:

top_1perc = df['ROA'].quantile(q=0.99)
bottom_1perc = df['ROA'].quantile(q=0.01)
new_df = df[(df['ROA']> bottom_1perc) & (df['ROA']< top_1perc)

If instead, you want to exclude them for each PRIMARY SIC CODE group:

df[df.groupby('PRIMARY SIC CODE')['ROA'].transform(\
   lambda x : ((x > x.quantile(q=0.01)) & (x<x.quantile(q=0.99)))).eq(1)]
E_Lasker
  • 36
  • 2
  • Thanks. The second one is the one that I need, thanks a lot! Question: I tried one thing which is very similar but using .apply. What's the difference with .transform? Which one is better? Thanks! – R__ Dec 19 '22 at 17:57
  • [This post](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object) goes in-depth on the difference between transform() and apply(). – E_Lasker Dec 20 '22 at 19:50
0

Try along the lines of...

df.groupby("PRIMARY SIC CODE")['ROA'].quantile(q=0.1)

Raghu
  • 41
  • 7