1

i have a data frame with the following columns:

  • id
  • name
  • product
  • count
  • price
  • discount

and i want to create a sumamry data frame where it shows the sum of how much each client has spend total. with and without discount applied.

i tried the following

summary = df.groupby('client_name')['price','count','discount'].agg([
    ('Total pre discount', df['price']*df['count']),
    ('Discount applied', df['price']*df['count']*df['discount']
])

and im getting this error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

is it even possible to do this in one step?

what am i doing wrong?

Vahn Toan
  • 93
  • 10

2 Answers2

3

Please note that agg() function cannot be used for calculation involving multiple columns. You have to use apply() function instead. Refer to this post for details.

if you had multiple columns that needed to interact together then you cannot use agg, which implicitly passes a Series to the aggregating function. When using apply the entire group as a DataFrame gets passed into the function.

For your case, you have to define a customized function as follows:

def f(x):
    data = {}
    data['Total pre discount'] = (x['price'] * x['count']).sum()
    data['Discount applied'] = (x['price'] * x['count'] * x['discount']).sum()
    return pd.Series(data)

Then perform your desired task by:

df.groupby('client_name').apply(f)

or if you want to use lambda function instead of customized function:

df.groupby('client_name').apply(lambda x: pd.Series({'Total pre discount':  (x['price'] * x['count']).sum(), 'Discount applied': (x['price'] * x['count'] * x['discount']).sum()}))

Run Demonstration

Test Data Creation

data = {'id': ['0100', '0200', '0100', '0200', '0300'], 'client_name': ['Ann', 'Bob', 'Ann', 'Bob', 'Charles'], 'product': ['pen', 'paper', 'folder', 'pencil', 'tray'], 'count': [12, 300, 5, 12, 10], 'price': [2.00, 5.00, 3.50, 2.30, 8.20], 'discount': [0.0, 0.1, 0.15, 0.1, 0.12]}
df = pd.DataFrame(data)
print(df)

Output: 
      id    client_name product count   price   discount
0   0100            Ann     pen    12     2.0       0.00
1   0200            Bob   paper   300     5.0       0.10
2   0100            Ann  folder     5     3.5       0.15
3   0200            Bob  pencil    12     2.3       0.10
4   0300        Charles    tray    10     8.2       0.12

Run New Codes

# Use either one of the following 2 lines of codes:
summary = df.groupby('client_name').apply(f)     # Using the customized function f() defined above
# or using lambda function
summary = df.groupby('client_name').apply(lambda x: pd.Series({'Total pre discount':  (x['price'] * x['count']).sum(), 'Discount applied': (x['price'] * x['count'] * x['discount']).sum()}))

print(summary)

Output:

               Total pre discount   Discount applied
client_name     
        Ann                  41.5              2.625
        Bob                1527.6            152.760
    Charles                  82.0               9.84
SeaBean
  • 22,547
  • 3
  • 13
  • 25
0
df.groupby(['client', 'discount']).agg({'price' : 'sum'}).reset_index()
S3DEV
  • 8,768
  • 3
  • 31
  • 42
acrobat
  • 812
  • 4
  • 7
  • 3
    Hey, could you pls explain your code? Wasn't able to understand how it works? ty! – Aryan Garg Mar 01 '21 at 07:46
  • No problem. It will group by both the client name and discount, then sum the prices for each (your column names might be slightly different, I took these from the bullet points you put at the top of your question). – acrobat Mar 01 '21 at 11:09