Taking mean of N largest values of group by absolute value

Question

I have some DataFrame:

d = {'fruit': ['apple', 'pear', 'peach'] * 6, 'values': np.random.uniform(-5,5,18), 'values2': np.random.uniform(-5,5,18)}
df = pd.DataFrame(data=d)

I can take the mean of each fruit group as such:

df.groupby('fruit').mean()

However, for each group of fruit, I'd like to take the mean of the N number of largest values as ranked by absolute value.

So for example, if my values were as follows and N=3:

[ 0.7578507 ,  3.81178045, -4.04810913,  3.08887538,  2.87999752, 4.65670954]

The desired outcome would be (4.65670954 + -4.04810913 + 3.81178045) / 3 = ~1.47

Edit - to clarify that sign is preserved in outcome:

(4.65670954 + -20.04810913 + 3.81178045) / 3 = -3.859

Tom · Accepted Answer · 2020-07-15T21:28:54.960

2

Updating with a new approach that I think is simpler. I was avoiding apply like the plague but maybe this is one of the more acceptable uses. Plus it fixes the fact that you want to mean the original values as ranked by their absolute values:

def foo(d):
    return d[d.abs().nlargest(3).index].mean()
    
out = df.groupby('fruit')['values'].apply(foo)

So you index each group by the 3 largest absolute values, then mean.

And for the record my original, incorrect, and slower code was:

df['values'].abs().groupby(df['fruit']).nlargest(3).groupby("fruit").mean()

edited Jul 15 '20 at 21:28

answered Jul 15 '20 at 21:15

Tom

8,310
2
16
36

Is the sign preserved? E.g. with values such as: (4.65670954 + -20.04810913 + 3.81178045) / 3, desired outcome is **-3.859** – Cactus Philosopher Jul 15 '20 at 21:18
1

@BuffaloCollector Oh no I don't think so; so you want to select the values to mean based on their absolute value, but you want to average their original values? – Tom Jul 15 '20 at 21:20
That's right, like: `np.mean(sorted([4.65670954 + -20.04810913 + 3.81178045], key=abs, reverse=True)[:3])` – Cactus Philosopher Jul 15 '20 at 21:21
This seems to work but it may not be the most efficient: `np.mean(sorted(df.loc[df['fruit'] == 'apple']['values'].values, key=abs, reverse=True)[:3])` – Cactus Philosopher Jul 15 '20 at 21:23
1

Very nice! Didn't even consider `groupby.apply`. GroupBy objects are scary... – Cactus Philosopher Jul 15 '20 at 21:34

Taking mean of N largest values of group by absolute value

1 Answers1