Suppose we want to do calculations between columns based on groups.
The original dataframe:
data = {'order_id': [1, 1, 1, 2, 2, 3],
'quantity': [1, 3, 1, 1, 2, 2],
'item_price': [10, 6, 4, 5, 3, 6],}
df = pd.DataFrame(data, columns=['order_id', 'quantity', 'item_price'])
order_id | quantity | item_price
1 1 10
1 3 6
1 1 4
2 1 5
2 2 3
3 2 6
I want to calculate the total price for each order, it should be like:
order_id | quantity | item_price | order_price
1 1 10 32
1 3 6 32
1 1 4 32
2 1 5 11
2 2 3 11
3 2 6 12
I get this by adding a new column item_price_total
:
df['item_price_total'] = df['quantity'] * df['item_price']
And use grouby(['order_id'])['item_price_total'].transform('sum')
:
order_id | quantity | item_price | item_price_total | order_price
1 1 10 10 32
1 3 6 18 32
1 1 4 4 32
2 1 5 5 11
2 2 3 6 11
3 2 6 12 12
My question is how to get the result directly from quantity
and item_price
grouped on order_id
, without the use of item_price_total
? My thought is to use groupby(['order_id']).apply()
with lambda
function, but after many attempts, I still didn't find a solution for that.