I am looking for help in filtering different dataframes to export to worksheets. Here is a sample dataframe.
import pandas as pd
import numpy as np
np.random.seed(1111)
df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B','Group C','Group D'], 10000),
'Sub-Category':np.random.choice( ['X','Y','Z'], 10000),
'Sub-Category-2':np.random.choice( ['G','F','I'], 10000),
'Product':np.random.choice( ['Product 1','Product 2','Product 3'], 10000),
'Units_Sold':np.random.randint(1,100, size=(10000)),
'Dollars_Sold':np.random.randint(100,1000, size=10000),
'Customer':np.random.choice(pd.util.testing.rands_array(10,25,dtype='str'),10000),
'Date':np.random.choice( pd.date_range('1/1/2016','12/31/2018',
freq='D'), 10000)})
Here are different dataframes I'd like to export into Excel workbooks:
df1 = df.groupby(['Category','Sub-Category-2','Product']).agg({'Units_Sold':'sum'})
df2 = df.groupby(['Category','Product',pd.Grouper(key='Date',freq='A-APR')]).agg({'Dollars_Sold':'sum'})
df3 = df.groupby(['Category','Product','Sub-Category']).agg({'Units_Sold':'sum','Dollars_Sold':'sum'})
For each 'Category', I'd like to create a separate Excel workbook with each dataframe in it filtered to show only that specific 'Category'. For example, workbook 'Group A' would have df1, df2, & df3 as separate worksheets in it with the dataframe showing only the values where 'Category' = 'Group A'. Workbook 'Group B' would have the same info, just filtered where 'Category' = 'Group B'.
I know how to do this manually by using .loc, but this seems very slow. My question is how do I do this in a pythonic way? The example data is not large, but my real-world data has 30+ categories in 'Category'. Is there a way to create a function to slice appropriately & kick out dataframes after filtering?