1

I have an issue that I cannot figure out although I read similar posts such as Pandas groupby to to_csv. It does not work for me. I am trying to write code to separate each group from a groupby object and save each group into its own excel spreadsheet.

I attached a toy example of the code that I did to get my groupby object on pandas with some columns.

Now, I need to save each group from this object into a separate csv file, or at least in a separate worksheet in excel.

dff = pd.DataFrame({'SKU': ['001', '002', '003'],
                    'revenue_contribution_in_percentage': [0.2, 0.5, 0.3],
                    'BuyPrice' : [2,3,4],
                    'SellPrice' : [5,6,6],
                    'margin' : [3,3,2],
                    'Avg_per_week' : [3,2,5],
                    'StockOnHand' : [4,10,20],
                            'StockOnOrder': [0,0,0],
                            'Supplier' : ['ABC', 'ABC', 'ABZ' ],
                            'SupplierLeadTime': [5,5,5],
                            'cumul_value':[0.4,0.6,1],
                            'class_mention':['A','A','B'],
                            'std_week':[1,2,1],
                            'review_time' : [2,2,2],
                            'holding_cost': [0.35, 0.35, 0.35],
                            'aggregate_order_placement_cost': [200, 230,210]
})

I have done the following to get a groupby supplier object

groups = [group.reset_index().set_index(['SKU'])[[
                            'revenue_contribution_in_percentage',
                            'BuyPrice',
                            'SellPrice',
                            'margin',
                            'Avg_per_week',
                            'StockOnHand',
                            'StockOnOrder',
                            'Supplier',
                            'SupplierLeadTime',
                            'cumul_value',
                            'class_mention',
                            'std_week',
                            'review_time',
                            'holding_cost',
                            'aggregate_order_placement_cost',
                            'periods']] for _, group in dff.groupby('Supplier')]

df_group = pd.DataFrame(groups).sum()
group_to_excel = df_group.to_csv('results.csv')

and the output that I would like to get is the folowing: two distinct datasets that can saved in csv format and look like this:

   SKU  revenue_contribution_in_percentage  BuyPrice  SellPrice  margin  \
0  001                                 0.2         2          5       3   
1  002                                 0.5         3          6       3   

   Avg_per_week  StockOnHand  StockOnOrder Supplier  SupplierLeadTime  \
0             3            4             0      ABC                 5   
1             2           10             0      ABC                 5   

   cumul_value class_mention  std_week  review_time  holding_cost  \
0          0.4             A         1            2          0.35   
1          0.6             A         2            2          0.35   

   aggregate_order_placement_cost  
0                             200  
1                             230  

and

   SKU  revenue_contribution_in_percentage  BuyPrice  SellPrice  margin  \
0  003                                 0.3         4          6       2   

   Avg_per_week  StockOnHand  StockOnOrder Supplier  SupplierLeadTime  \
0             5           20             0      ABZ                 5   

   cumul_value class_mention  std_week  review_time  holding_cost  \
0            1             B         1            2          0.35   

   aggregate_order_placement_cost  
0                             210  

At this point my code give one and only worksheet (horrendous worksheet) with pretty much nothing on it. I am not sure what is wrong at this point. I would greatly appreciate some help on this one! thanks a lot!

Murcielago
  • 905
  • 1
  • 8
  • 30
  • Can you specify what you want in the spreadsheet? Ideally with some dummy data so it is possible to reproduce? We do not know what the variable group contains. – divingTobi Jan 06 '20 at 13:44
  • where is df2? i'm getting an error on `groups` – Umar.H Jan 06 '20 at 15:29
  • sorry for the typo, df2 is actually dff, I just updated my code. – Murcielago Jan 06 '20 at 15:33
  • This answer can help with the accessing groups by keys: https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key Access each group on loop and save them individually – Adarsh Chavakula Jan 06 '20 at 15:33

1 Answers1

1

You don't need groupby as you are not aggregating anything. What you really want is to slice dff by each unique supplier and export them to their own files. Try this:

cols = [
    'SKU',
    'revenue_contribution_in_percentage',
    'BuyPrice',
    'SellPrice',
    'margin',
    'Avg_per_week',
    'StockOnHand',
    'StockOnOrder',
    'Supplier',
    'SupplierLeadTime',
    'cumul_value',
    'class_mention',
    'std_week',
    'review_time',
    'holding_cost',
    'aggregate_order_placement_cost'
]

for supplier in dff['Supplier'].unique():
    sub_dff = dff[dff['Supplier'] == supplier][cols]
    sub_dff.to_csv(f'{supplier}_data.csv')
Code Different
  • 90,614
  • 16
  • 144
  • 163