3

I need to groupby multiple columns & then get Sum in a new column with added If condition. I tried the next code and it worked great with grouping by single column:

df['new column'] = (
    df['value'].where(df['value'] > 0).groupby(df['column1']).transform('sum')
)

However, when I try to group by multiple columns I get an error.

df['new_column'] = (
        df['value'].where(df['value'] > 0).groupby(df['column1', 'column2']).transform('sum')
    )

Error:

->return self._engine.get_loc(casted_key) 
The above exception was the direct cause of the following exception: 
->indexer = self.columns.get_loc(key) 
->raise KeyError(key) from err 
->if is_scalar(key) and isna(key) and not self.hasnans: ('column1', 'column2')

Could you please advise how I should change the code to get the same result but grouping by multiple columns?

Thank you

EugLP
  • 33
  • 5
  • 1
    provide a dataframe sample as a code https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – aestet May 15 '22 at 07:54

1 Answers1

1

Cause of error

  • The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
  • Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
        df['value']
          .where(df['value'] > 0)
          .groupby([df['column1'], df['column2']]) # Notice the change
          .transform('sum')
)
Easy way:

First assign the masked column values to the target column, then do groupby + transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53