3

Suppose we want to do calculations between columns based on groups.

The original dataframe:

data = {'order_id': [1, 1, 1, 2, 2, 3],
        'quantity': [1, 3, 1, 1, 2, 2],
        'item_price': [10, 6, 4, 5, 3, 6],}
df = pd.DataFrame(data, columns=['order_id', 'quantity', 'item_price'])
order_id | quantity | item_price 
    1          1           10              
    1          3            6              
    1          1            4               
    2          1            5               
    2          2            3               
    3          2            6              

I want to calculate the total price for each order, it should be like:

order_id | quantity | item_price | order_price
    1          1           10           32   
    1          3            6           32 
    1          1            4           32  
    2          1            5           11  
    2          2            3           11  
    3          2            6           12

I get this by adding a new column item_price_total:

df['item_price_total'] = df['quantity'] * df['item_price']

And use grouby(['order_id'])['item_price_total'].transform('sum'):

order_id | quantity | item_price | item_price_total | order_price
    1          1           10           10                32   
    1          3            6           18                32 
    1          1            4            4                32  
    2          1            5            5                11  
    2          2            3            6                11  
    3          2            6           12                12

My question is how to get the result directly from quantity and item_price grouped on order_id, without the use of item_price_total? My thought is to use groupby(['order_id']).apply() with lambda function, but after many attempts, I still didn't find a solution for that.

Simon Z.
  • 598
  • 5
  • 11
  • 3
    you can write it directly by: using series groupby: `df['order_price'] = df['quantity'].mul(df['item_price']).groupby(df['order_id']).transform('sum')` – anky Aug 18 '20 at 15:39
  • 1
    Do as anky suggests. You *can* do this with a `groupby` + `lambda` but that will be much slower. With groupby the goal is generally to do away with `lambda`s in favor of the builtin methods (https://stackoverflow.com/questions/63306888/general-groupby-in-python-pandas-fast-way/63307380#63307380), in this case `sum`, like anky suggests – ALollz Aug 18 '20 at 15:47

1 Answers1

0

Thanks to Anky's idea,

You can try this:

result = pd.DataFrame(df['quantity'].mul(df['item_price'])
                                    .groupby(df['order_id'])
                                    .transform('sum'), columns=['order_price'])
                                    .join(df)
print(result)

#    order_price  order_id  quantity  item_price
# 0           32         1         1          10
# 1           32         1         3           6
# 2           32         1         1           4
# 3           11         2         1           5
# 4           11         2         2           3
# 5           12         3         2           6
A. Nadjar
  • 2,440
  • 2
  • 19
  • 20