0

I am looking for help in filtering different dataframes to export to worksheets. Here is a sample dataframe.

import pandas as pd
import numpy as np

np.random.seed(1111)
df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B','Group C','Group D'], 10000),
'Sub-Category':np.random.choice( ['X','Y','Z'], 10000),
'Sub-Category-2':np.random.choice( ['G','F','I'], 10000),
'Product':np.random.choice( ['Product 1','Product 2','Product 3'], 10000),
'Units_Sold':np.random.randint(1,100, size=(10000)),
'Dollars_Sold':np.random.randint(100,1000, size=10000),
'Customer':np.random.choice(pd.util.testing.rands_array(10,25,dtype='str'),10000),
'Date':np.random.choice( pd.date_range('1/1/2016','12/31/2018',  
                      freq='D'), 10000)})

Here are different dataframes I'd like to export into Excel workbooks:

df1 = df.groupby(['Category','Sub-Category-2','Product']).agg({'Units_Sold':'sum'})
df2 = df.groupby(['Category','Product',pd.Grouper(key='Date',freq='A-APR')]).agg({'Dollars_Sold':'sum'})
df3 = df.groupby(['Category','Product','Sub-Category']).agg({'Units_Sold':'sum','Dollars_Sold':'sum'})

For each 'Category', I'd like to create a separate Excel workbook with each dataframe in it filtered to show only that specific 'Category'. For example, workbook 'Group A' would have df1, df2, & df3 as separate worksheets in it with the dataframe showing only the values where 'Category' = 'Group A'. Workbook 'Group B' would have the same info, just filtered where 'Category' = 'Group B'.

I know how to do this manually by using .loc, but this seems very slow. My question is how do I do this in a pythonic way? The example data is not large, but my real-world data has 30+ categories in 'Category'. Is there a way to create a function to slice appropriately & kick out dataframes after filtering?

keg5038
  • 341
  • 3
  • 13

1 Answers1

0

How about just running

for c in df.Category.unique():
    with pd.ExcelWriter(f"/Users/constantino/Desktop/{c}.xlsx") as writer:
        for i, d in enumerate([df1, df2, df3]):
            d.loc[c].to_excel(writer, sheet_name=f"df{i+1}")
Constantino
  • 2,243
  • 2
  • 24
  • 41
  • Thanks so much for the quick response! This works, however I'd rather not reset the index prior to the operation as the grouping is preferred. Any suggestions on how to make that work? – keg5038 May 17 '19 at 13:31
  • Sure, I've updated the answer to reflect your request. – Constantino May 17 '19 at 16:14
  • Great - thank you so much! That works great & is simple - I should have caught that. Much appreciated! Last request - in the sheet_name section, is it possible to pass the list of dataframes inside of enumerate? My issues is my dataframes are named differently than df1, df2, df3, etc. They may be named view1, pivot2, test3, etc. I'd like the worksheet names to reflect that actual dataframe names if possible. Thanks again! – keg5038 May 17 '19 at 16:24
  • Hi @Constantino - any luck with last question? – keg5038 May 21 '19 at 14:28
  • It's not directly possible, see https://stackoverflow.com/q/18425225/1153897 I wouldn't suggest dong this anyway since the variable names you choose in the script don't mean anything outside the context of that script. Instead, choose a systematic meaningful way to report your data. – Constantino May 22 '19 at 20:06