3

I would like to calculate the average score of every group in a column weighted by another column.

I will write an example to clarify my goal. Let's say I have the following pandas dataframe:

Group # items score
A 10 2
A 15 4
A 20 6
B 5 5
B 10 8

My desired output would be:

Group avg_weighted_score
A 4.444
B 7
df = pd.DataFrame([['A',10,2],['A',15,4],['A',20,6],['B',5,5],['B',10,8]],columns = ['Group', '#items', 'score'])
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
Javier Monsalve
  • 326
  • 3
  • 14
  • 1
    so `#Item` is the weighted column? – sammywemmy Sep 15 '21 at 09:27
  • 2
    Does this answer your question? [Calculate weighted average using a pandas/dataframe](https://stackoverflow.com/questions/26205922/calculate-weighted-average-using-a-pandas-dataframe) – Tzane Sep 15 '21 at 09:28

4 Answers4

3

Let's do this:

f = lambda x: sum(x['#items'] * x['score']) / sum(x['#items'])

df.groupby('Group').apply(f)
kelvt
  • 949
  • 6
  • 16
2

Group the dataframe by Group column, then apply a function to calculate the weighted average using nump.average passing score column values for average, and # items as weights. You can call to_frame passing new column name to create a dataframe out of the resulting sereis.

(
    df.groupby('Group')
    .apply(lambda x: np.average(x['score'], weights=x['# items']))
    .to_frame('avg_weighted_score')
    )
       avg_weighted_score
Group                    
A                4.444444
B                7.000000
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
2

I feel you get better performance if you precompute the sum of the product of the weighted columns, before grouping and aggregating (just an assumption):

(df.set_index('Group')
   .assign(numerator = lambda df: df.prod(1))
   .groupby('Group')
   .pipe(lambda group: group.numerator.sum() / group['#items'].sum())
)


Group
A    4.444444
B    7.000000
dtype: float64

Another solution, from @Mozway:

groups = (df.assign(w=df['#items']
                      .mul(df['score']))
           .groupby(df['Group']) 
         )

groups['w'].sum().div(groups['#items'].sum())

Group
A    4.444444
B    7.000000
dtype: float64


sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • 1
    Damn I was wondering why I got the wrong answer, I used the wrong column ^^. Here is a version similar to what you did (+1 btw), but using groupby objects directly (if you want feel free to use in your answer): `groups = df.assign(w=df['#items'].mul(df['score'])).groupby(df['Group']) ; groups['w'].sum().div(groups['#items'].sum())`. This should indeed be faster than the `apply` approaches – mozway Sep 15 '21 at 09:43
1

Try:

x = (
    df.groupby("Group")
    .apply(lambda x: np.average(x["score"], weights=x["#items"]))
    .reset_index()
    .rename(columns={0: "avg_weighted_score"})
)
print(x)

Prints:

  Group  avg_weighted_score
0     A            4.444444
1     B            7.000000
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91