Pandas groupby, select 3 elements with largest values, and take the mean of each group

Question

I have some DataFrame:

df = pd.DataFrame({'columnA': ['apple', 'apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange'], 'columnB': [0.10, -0.15, 0.25, -0.55, 0.50, -0.51, 0.70, 0.90]})

    columnA columnB
0   apple   0.10
1   apple   -0.15
2   apple   0.25
3   apple   -0.55
4   orange  0.50
5   orange  -0.51
6   orange  0.70
7   orange  0.90

I want to group the data by columnA and take the mean of the 3 rows with the largest values (in terms of absolute value) in columnB.

The first thing I tried was:

df.reindex(df['columnB'].abs().sort_values(ascending=False).index).groupby('columnA').head(3).groupby('columnA')[['columnB']].mean().reset_index()


columnA columnB
0   apple   -0.150000
1   orange  0.363333

This looks correct, but I wanted to try and simplify with this:

df.iloc[df['columnB'].abs().argsort()].groupby('columnA').head(3).groupby('columnA')[['columnB']].mean().reset_index()

    columnA columnB
0   apple   0.066667
1   orange  0.230000

This is not correct. What I am missing here?

jezrael · Accepted Answer · 2019-12-16T06:33:26.250

2

I think you can convert values to negative or change order of positions, check this:

df1 = (df.iloc[(-df['columnB'].abs()).argsort()]
          .groupby('columnA')['columnB'].apply(lambda x: x.head(3).mean())
          .reset_index())
print (df1)
  columnA   columnB
0   apple -0.150000
1  orange  0.363333

df1 = (df.iloc[df['columnB'].abs().argsort()[::-1]]
          .groupby('columnA')['columnB'].apply(lambda x: x.head(3).mean())
          .reset_index())
print (df1)
  columnA   columnB
0   apple -0.150000
1  orange  0.363333

edited Dec 16 '19 at 06:33

answered Dec 16 '19 at 06:24

jezrael

822,522
95
1,334
1,252

1

So all I need to do is reverse the order of argsort? – Cactus Philosopher Dec 16 '19 at 06:25
Any reason to use one approach over the other? Do you have a preference? – Cactus Philosopher Dec 16 '19 at 06:29
1

@BuffaloCollector - hmmm, I think argsort solution should be faster, I prefer convert values to negative. – jezrael Dec 16 '19 at 06:38

Pandas groupby, select 3 elements with largest values, and take the mean of each group

1 Answers1