-1

I have an input dataframe like below:

df = pd.DataFrame({'factory': ['kerala', 'kerala', 'kerala', 'delhi', 'delhi', 'goa', 'goa'],
               'plant': ['', '', '', '', '', '', ''],
               'market': ['', '', '', '', '', '', ''],
               'product': ['A', 'B', 'C', 'A', 'B', 'A', 'B'],
               'BP4-2023': [4, 4, 5, 6, 4, 5, 5],
               'RE4-2023': [7, 7, 8, 8, 7, 8, 8],
               'BP5-2023': [4, 4, 5, 6, 4, 5, 5],
               'RE5-2023': [7, 7, 8, 8, 7, 8, 8]})

And I want to get the output dataframe like below:

  factory    plant   market product  BP4-2023  RE4-2023 BP5-2023  RE5-2023
0 TOTAL                                 
1   A                                15        23       15        23
2   B                                13        22       13        22
3   C                                5         8        5         8

Input dataframe will be dynamic(BP and RE columns).BP and RE columns will vary based on months and year.In this A,B,C are product names.Extra products will come,at that time that products will also be added in the output dataframe.

How can I get the below output.Can anyone suggest a solution?

AbinBenny
  • 59
  • 6

1 Answers1

0

What you're looking for is pandas.DataFrame.groupby(). E.g. for your problem,

df.groupby(['product']).sum()

Will produce the following output:

         BP4-2023  RE4-2023  BP5-2023  RE5-2023
product                                        
A              15        23        15        23
B              13        22        13        22
C               5         8         5         8

If you want to reuse it you can also wrap it into a function, like so:

In []: def group_sum(df, column):
         print(df.groupby([column]).sum())


In []: group_sum(df, 'product')
         BP4-2023  RE4-2023  BP5-2023  RE5-2023
product                                        
A              15        23        15        23
B              13        22        13        22
C               5         8         5         8

In []: group_sum(df, 'factory')
         BP4-2023  RE4-2023  BP5-2023  RE5-2023
factory                                        
delhi          10        15        10        15
goa            10        16        10        16
kerala         13        22        13        22
worreby
  • 1
  • 2
  • how to keep remaining columns blank like in my expected output@worreby – AbinBenny May 05 '23 at 04:19
  • @AbinBenny as far as I know that doesn't work if the columns are empty with `groupby`. If you add values to the columns they will be kept though. – worreby May 05 '23 at 08:42