Calculate weighted sum using two columns in pandas dataframe

Question

I am trying to calculate weighted sum using two columns in a python dataframe.

Dataframe structure:

unique_id   weight            value
1           0.061042375       20.16094523
1           0.3064548         19.50932003
1           0.008310739       18.76469039
1           0.624192086       21.25
2           0.061042375       20.23776924
2           0.3064548         19.63366165
2           0.008310739       18.76299395
2           0.624192086       21.25

.......

Output I desired is:

Weighted sum for each unique_id = sum((weight) * (value))

Example: Weighted sum for unique_id 1 = ( (0.061042375 * 20.16094523) + (0.3064548 * 19.50932003) + (0.008310739 * 18.76469039) + (0.624192086 * 21.25) )

I checked out this answer (Calculate weighted average using a pandas/dataframe) but could not figure out the correct way of applying it to my specific scenario.

This is what I am doing based on the above answer:

#Assume temp_weighted_sum_dataframe is the dataframe stated above

grouped_data = temp_weighted_sum_dataframe.groupby('unique_id') #I think this groups data based on unique_id values
weighted_sum_output = (grouped_data.weight * grouped_data.value).transform("sum") #This should allow me to multiple weight and value for every record within each group and sum it up to one value for that group.

# On above line I am getting the error > TypeError: unsupported operand type(s) for *: 'SeriesGroupBy' and 'SeriesGroupBy'

Any help is appreciated, thanks

score 3 · Answer 1 · answered Nov 27 '19 at 03:17

The accepted answer in the linked question would indeed solve your problem. However, I would solve it differently with just one groupby:

u = (df.assign(s=df['weight']*df['value'])
       .groupby('unique_id')
       [['s', 'weight']]
       .sum()
     )

u['s']/u['weight']

Output:

unique_id
1    20.629427
2    20.672208
dtype: float64

Derek Eden · Accepted Answer · 2019-11-27T03:38:39.903

2

you could do it this way:

df['partial_sum'] = df['weight']*df['value']
out = df.groupby('unique_id')['partial_sum'].agg('sum')

output:

unique_id
1    20.629427
2    20.672208

or..

df['weight'].mul(df['value']).groupby(df['unique_id']).sum()

same output

edited Nov 27 '19 at 03:38

answered Nov 27 '19 at 03:23

Derek Eden

4,403
3
18
31

This works when the total weights within groups are equal to 1, which happens in this case. Also `agg('sum')` is better than `agg(sum)`. – Quang Hoang Nov 27 '19 at 03:28
well if you're doing a weighted average better hope your weights/normalized weights are equal to 1 or you've got bigger issues :) and thanks..could you explain why 'sum' is better than sum? curious – Derek Eden Nov 27 '19 at 03:31
3

`agg('sum')` uses the vectorized version, which is equivalent to `np.sum` or `pd.Series.sum()`, while `agg(sum)` uses python's non-vectorized `sum` – Quang Hoang Nov 27 '19 at 03:32
@DerekEden : Did you actually get the unique_id column in the results ? Asking because I did not get it and have to get it because I have to relate this to some other dataframe – user_11077035 Nov 27 '19 at 03:52
I copy-pasted your data into a csv and read it in then ran those commands and got that output..the index has the name 'unique_id' not the resulting series – Derek Eden Nov 27 '19 at 04:05

score 2 · Answer 3 · answered Nov 27 '19 at 03:42

2

You may take advantage agg by using agg with @ (it is dot)

df.groupby('unique_id')[['weight']].agg(lambda x: x.weight @ x.value)

Out[24]:
              weight
unique_id
1          20.629427
2          20.672208

answered Nov 27 '19 at 03:42

Andy L.

24,909
4
17
29

1

Does your code run? `[['weight']]` does not have `value` property. – Quang Hoang Nov 27 '19 at 03:53
1

@QuangHoang: Yes. I tried it. You may test it yourself. I am kind of surprised, but it does run fine. I guess it is a quirk of `agg`. It only runs on [['...']], but errors out on ['...'] :) – Andy L. Nov 27 '19 at 04:14
It does run. IMHO, that would definitely be a bug, but yeah :-). – Quang Hoang Nov 27 '19 at 04:17
@QuangHoang: :)) +1 – Andy L. Nov 27 '19 at 04:19

Calculate weighted sum using two columns in pandas dataframe

3 Answers3