Weighted average by another column in pandas

Question

I would like to calculate the average score of every group in a column weighted by another column.

I will write an example to clarify my goal. Let's say I have the following pandas dataframe:

Group	# items	score
A	10	2
A	15	4
A	20	6
B	5	5
B	10	8

My desired output would be:

Group	avg_weighted_score
A	4.444
B	7

df = pd.DataFrame([['A',10,2],['A',15,4],['A',20,6],['B',5,5],['B',10,8]],columns = ['Group', '#items', 'score'])

Does this answer your question? [Calculate weighted average using a pandas/dataframe](https://stackoverflow.com/questions/26205922/calculate-weighted-average-using-a-pandas-dataframe) — Tzane, Sep 15 '21 at 09:28

score 3 · Accepted Answer · answered Sep 15 '21 at 09:28

3

Let's do this:

f = lambda x: sum(x['#items'] * x['score']) / sum(x['#items'])

df.groupby('Group').apply(f)

answered Sep 15 '21 at 09:28

kelvt

949
6
16

score 2 · Answer 2 · answered Sep 15 '21 at 09:29

Group the dataframe by Group column, then apply a function to calculate the weighted average using nump.average passing score column values for average, and # items as weights. You can call to_frame passing new column name to create a dataframe out of the resulting sereis.

(
    df.groupby('Group')
    .apply(lambda x: np.average(x['score'], weights=x['# items']))
    .to_frame('avg_weighted_score')
    )
       avg_weighted_score
Group                    
A                4.444444
B                7.000000

sammywemmy · Answer 3 · 2021-09-15T09:48:08.347

2

I feel you get better performance if you precompute the sum of the product of the weighted columns, before grouping and aggregating (just an assumption):

(df.set_index('Group')
   .assign(numerator = lambda df: df.prod(1))
   .groupby('Group')
   .pipe(lambda group: group.numerator.sum() / group['#items'].sum())
)


Group
A    4.444444
B    7.000000
dtype: float64

Another solution, from @Mozway:

groups = (df.assign(w=df['#items']
                      .mul(df['score']))
           .groupby(df['Group']) 
         )

groups['w'].sum().div(groups['#items'].sum())

Group
A    4.444444
B    7.000000
dtype: float64

edited Sep 15 '21 at 09:48

answered Sep 15 '21 at 09:30

sammywemmy

27,093
4
17
31

1

Damn I was wondering why I got the wrong answer, I used the wrong column ^^. Here is a version similar to what you did (+1 btw), but using groupby objects directly (if you want feel free to use in your answer): `groups = df.assign(w=df['#items'].mul(df['score'])).groupby(df['Group']) ; groups['w'].sum().div(groups['#items'].sum())`. This should indeed be faster than the `apply` approaches – mozway Sep 15 '21 at 09:43

score 1 · Answer 4 · answered Sep 15 '21 at 09:28

Try:

x = (
    df.groupby("Group")
    .apply(lambda x: np.average(x["score"], weights=x["#items"]))
    .reset_index()
    .rename(columns={0: "avg_weighted_score"})
)
print(x)

Prints:

  Group  avg_weighted_score
0     A            4.444444
1     B            7.000000

Weighted average by another column in pandas

4 Answers4