How to split the dataframe and store it in multiple sheets of a excel file

Question

I have a dataframe like as shown below

import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
                   'customer': rng.choice(list('ACD'),size=(5)),
                   'region': rng.choice(list('PQRS'),size=(5)),
                   'dumeel': rng.choice(list('QWER'),size=(5)),
                   'dumma': rng.choice((1234),size=(5)),
                   'target': rng.choice([0,1],size=(5))
})

I would like to do the below

a) extract the data for unique combination of region and customer. Meaning groupby.

b) store them in each sheet of one excel file (based on number of groups)

I was trying something like below but there should be some neat pythonic way to do this

df_list = []
grouped = cdf.groupby(['customer','region'])
for k,v in grouped:
    for i in range(len(k)):
        df = cdf[(cdf['customer']==k[i] & cdf['region']==k[i+1])]
        df_list.append(df)

I expect my output to be like below (showing in multiple screenshots).

As my real data has 200 columns and million rows, any efficient and elegant approach would really be helpful

jezrael · Accepted Answer · 2022-02-22T10:43:05.220

3

Use this solution in loop:

writer = pd.ExcelWriter('out.xlsx', engine='xlsxwriter')
    
for (cust, reg), v in cdf.groupby(['customer','region']):
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}")
        
    # Close the Pandas Excel writer and output the Excel file.
writer.save()

edited Feb 22 '22 at 10:43

answered Feb 22 '22 at 10:37

jezrael

822,522
95
1,334
1,252

WOW. so fast. will try and update you. upvoted – The Great Feb 22 '22 at 10:38
but for loop would work for like big dataset of million rows. where the number of groups may be higher\ – The Great Feb 22 '22 at 10:38
@TheGreat - my experience - bottleneck is in function `to_excel`, so looping per groups is no problem. – jezrael Feb 22 '22 at 10:39
@TheGreat - answer was edited. – jezrael Feb 22 '22 at 10:43
@TheGreat - never working with `StyleFrame` :( – jezrael Feb 22 '22 at 13:37
@TheGreat - unfortunately not. – jezrael Feb 22 '22 at 13:39
For the same task, do you know how can we retain the format, font and color of the excel when I create each sheet? Meaning, master file has a certain color, format, font etc. We would like to have exact font, format and color in child file (multiple sheets)? I will link the new post here – The Great Feb 23 '22 at 05:52
@TheGreat - Unfortunately no experience with `StyleFrame` what is necessary use here. – jezrael Feb 23 '22 at 06:28
Ah okay. we can't do that with excelwriter? etc – The Great Feb 23 '22 at 06:29
@TheGreat - Honestly no idea. I guess not. Maybe I am wrong, because with excel styles processing I know only basics. – jezrael Feb 23 '22 at 06:30
Do you know whether it is possible to write the above output to a already existing file (`dummy.xlsx`) instead of `out.xlsx` as shown in post? – The Great Feb 23 '22 at 07:01
@TheGreat - Check [this](https://stackoverflow.com/a/42375263/2901002) solution. – jezrael Feb 23 '22 at 07:03

How to split the dataframe and store it in multiple sheets of a excel file

1 Answers1