0

I would like to calculate a mean value of "bonus" according to column "first_name", but the denominator is not the sum of the cases, because not all the cases have weight of 1, instead the may have 0.5 weight.

for instance in the case of Jason the value that I want is the sum of his bonus divided by 2.5.

Since in real life I have to group by several columns, like area, etc, I would like to adapt a groupby to this situation.

Here is my try, but it gives me the normal mean


raw_data = {'area': [1,2,3,3,4],'first_name': ['Jason','Jason','Jason', 'Jake','Jake'],
        'bonus': [10,20, 10, 30, 20],'weight': [1,1,0.5,0.5,1]}
df = pd.DataFrame(raw_data, columns = ['area','first_name','bonus','weight'])
df

progster
  • 877
  • 3
  • 15
  • 27
  • Does this answer your question? [Calculate weighted average using a pandas/dataframe](https://stackoverflow.com/questions/26205922/calculate-weighted-average-using-a-pandas-dataframe) – Chayan Bansal Apr 15 '20 at 14:33

2 Answers2

2

Use:

(df.groupby('first_name')[['bonus', 'weight']].sum()
  #.add_prefix('sum_') # you could also want it
   .assign(result = lambda x: x['bonus'].div(x['weight'])))

or

(df[['first_name', 'bonus', 'weight']].groupby('first_name').sum()
  #.add_prefix('sum_')
   .assign(result = lambda x: x['bonus'].div(x['weight'])))

Output

            bonus  weight     result
first_name                          
Jake           50     1.5  33.333333
Jason          40     2.5  16.000000
ansev
  • 30,322
  • 5
  • 17
  • 31
  • just one thing. here it works perfectly, but in real life I have several columns and this code add results column, but also a lot of columns that I don't want to the df dataframe. here for example now I addes an "area" column, and you could see in the output an aggregated column with the sum of area. is there a way to avoid this area column in the results? – progster Apr 15 '20 at 15:00
  • you can select columns after or before groupby, please see updated code:) – ansev Apr 15 '20 at 15:04
1

One way is to use groupby().apply and np.average:

df.groupby('first_name').apply(lambda x: np.average(x.bonus, weights=x.weight))

Output:

first_name
Jake     23.333333
Jason    14.000000
dtype: float64
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • I'm not sure what OP is looking for *for instance in the case of Jason the value that I want is the sum of his bonus divided by 2.5.*, but good answer:) – ansev Apr 15 '20 at 14:35
  • @Quang Hoang I think that there is something wrong in your code, the correct aimed results are shown by ansev answer – progster Apr 15 '20 at 14:45
  • My code computes **weighted average**, not entirely what you want, which is want @ansev does. – Quang Hoang Apr 15 '20 at 14:47