I'm trying to find the weighted median of a column based on multiple columns. Here is an example:
Date | Item | BetterPrice | TotalCost | Location
-------------------------------------------------------------------------
2022-03-01 | AB | 0 | 200 | 3
2022-03-01 | AB | 0 | 200 | 2
2022-03-01 | AB | 1 | 300 | 3
2022-03-01 | AC | 1 | 400 | 2
2022-04-01 | AB | 1 | 400 | 1
2022-04-01 | AC | 1 | 100 | 3
2022-04-01 | AC | 0 | 50 | 1
I want to find weighted median of the Location
column and I want to use the TotalCost
column as weights. I want to use aggregate TOO since I want to find the sum of the BetterPrice column as well.
Originally I tried to used the package wquantiles
with the weighted.median
function, and to do the above tasks I tried something similar to the following code:
import wquantiles
wm = lambda x: weighted.median(x , TotalCost)
df2 = df.groupby(['Date', 'Item']).agg({'BetterPrice': 'sum', "Location": wm}).reset_index()
Unfortunately, this doesn't seem to work, so I thought I should try to ask how best to accomplish this task, thank you!
Also, here is a dataframe to replicate my above problem (hopefully it is detailed enough)
import pandas as pd
data={'Date':['2022-03-01','2022-03-01','2022-03-01','2022-03-01', '2022-04-01', '2022-04-01', '2022-04-01'],'Item':['AB','AB','AB','AC', 'AB', 'AC', 'AC'],'BetterPrice':[0,0,1,1, 1, 1, 0],'TotalCost':[200,200,300,400, 400, 100, 50],'Location':[3,2,3,2,1,3, 1]}
df=pd.DataFrame(data)