0

I'm trying to calculate the weighted average of the "prices" column in the following dataframe for each zone, regardless of hour. I want to essentially sum the quantities that match A, divide each individual quantity row by that amount (to get the weights) and then multiply it by the price.

There are about 200 zones, I'm having a hard time writing something that will generically detect that the Zones match, and not have to write df['ZONE'] = 'A' etc. Please help my lost self =)

HOUR:   1,2,3,1,2,3,1,2,3

ZONE:   A,A,A,B,B,B,C,C,C

PRICE:  12,15,16,17,12,11,12,13,15

QUANTITY: 5,6,1 5,7,9 6,3,2

I'm not sure if you can generically write something, but I thought what if I wrote a function where x is my "Zone", create a list with possible zones, and then create a for loop. Here's the function I wrote, doesn't really work - trying to figure out how else I can make it work

def wavgp(x):
    df.loc[df['ZONE'].isin([str(x)])] = x
Mikhail Stepanov
  • 3,680
  • 3
  • 23
  • 24

1 Answers1

1

Here is a possible solution using groupby operation:

weighted_price = df.groupby('ZONE').apply(lambda x: (x['PRICE'] * x['QUANTITY']).sum()/x['QUANTITY'].sum())

Explaination

First we groupby zone , for each of these block (of the same zone) we are going to multiply the price by the quantity and sum these values. We divide this result by the sum of the quantity to get your desired result.

ZONE
A    13.833333
B    12.761905
C    12.818182
dtype: float64

abcdaire
  • 1,528
  • 1
  • 10
  • 21