How to sum values in a column based on a given condition using Python pandas?

Question

I have an input dataframe like below:

df = pd.DataFrame({'factory': ['kerala', 'kerala', 'kerala', 'delhi', 'delhi', 'goa', 'goa'],
               'plant': ['', '', '', '', '', '', ''],
               'market': ['', '', '', '', '', '', ''],
               'product': ['A', 'B', 'C', 'A', 'B', 'A', 'B'],
               'BP4-2023': [4, 4, 5, 6, 4, 5, 5],
               'RE4-2023': [7, 7, 8, 8, 7, 8, 8],
               'BP5-2023': [4, 4, 5, 6, 4, 5, 5],
               'RE5-2023': [7, 7, 8, 8, 7, 8, 8]})

And I want to get the output dataframe like below:

  factory    plant   market product  BP4-2023  RE4-2023 BP5-2023  RE5-2023
0 TOTAL                                 
1   A                                15        23       15        23
2   B                                13        22       13        22
3   C                                5         8        5         8

Input dataframe will be dynamic(BP and RE columns).BP and RE columns will vary based on months and year.In this A,B,C are product names.Extra products will come,at that time that products will also be added in the output dataframe.

How can I get the below output.Can anyone suggest a solution?

Try: `df.drop(columns='factory').groupby('product', as_index=False).sum()` — Corralien, May 04 '23 at 14:11
how to keep remaining columns blank like in my expected output@Corralien — AbinBenny, May 05 '23 at 04:20
`df.assign(factory='').groupby('product', as_index=False).sum()` — Corralien, May 05 '23 at 06:17

score 0 · Answer 1 · answered May 04 '23 at 14:18

What you're looking for is pandas.DataFrame.groupby(). E.g. for your problem,

df.groupby(['product']).sum()

Will produce the following output:

         BP4-2023  RE4-2023  BP5-2023  RE5-2023
product                                        
A              15        23        15        23
B              13        22        13        22
C               5         8         5         8

If you want to reuse it you can also wrap it into a function, like so:

In []: def group_sum(df, column):
         print(df.groupby([column]).sum())


In []: group_sum(df, 'product')
         BP4-2023  RE4-2023  BP5-2023  RE5-2023
product                                        
A              15        23        15        23
B              13        22        13        22
C               5         8         5         8

In []: group_sum(df, 'factory')
         BP4-2023  RE4-2023  BP5-2023  RE5-2023
factory                                        
delhi          10        15        10        15
goa            10        16        10        16
kerala         13        22        13        22

how to keep remaining columns blank like in my expected output@worreby — AbinBenny, May 05 '23 at 04:19
@AbinBenny as far as I know that doesn't work if the columns are empty with `groupby`. If you add values to the columns they will be kept though. — worreby, May 05 '23 at 08:42

How to sum values in a column based on a given condition using Python pandas?

1 Answers1