Split a dataframe into multiple data set based on condition and each sub set into Excel

Question

Can someone please help me here? I do not get any output and I do not get either an error message. I am trying to filter a dataframe into multiple sub set using customer conditions and paste each sub set into Excel worksheets.

Master_data(df) Output A Output B

import pandas as pd
import os
## Belgium\2020\GMC Prep Automation")
from openpyxl import load_workbook
import xlsxwriter
from shutil import copyfile

file = input("please enter excelfile: ")
extension = os.path.splitext(file)[1]
filename = os.path.splitext(file)[0]
pth = "\\we.interbrew.net\\DFSEurope\\Crown Jewels\\Revenue Management\\WEST\\2. BE\\4. MPM Belgium\\2020\\GMC Prep Automation"
newfile = os.path.join(pth, filename+"_2"+extension)
#myfile = os.path.join(pth, Split_Test.xlsx)

df = pd.read_excel(file)
colpick = input("enter column to be splitted: ")        
col = list(set(df[colpick].values))

def sendtoexcel(col):
    copyfile(file, newfile)

    for j in col:
        writer = pd.ExcelWriter(newfile,engine='openpyxl')

        for myname in col:
            mydf=df.loc[df[colpick] == myname]
            mydf.to_excel(writer,sheet_name=myname,index=False)               

        writer.save()

    print("\nCompleted")
    return

What is the output of `df=pd.read_excel(file)`? What does `df` hold? — Mayank Porwal, May 02 '20 at 02:11
Where do you actually call your function? Code simply receives `input` and defines a function. — Parfait, May 02 '20 at 14:30
i have added the master data(input file)`df` and out put snapshot — Biplab1985, May 02 '20 at 14:49

Parfait · Accepted Answer · 2020-05-04T20:20:08.800

0

Assuming user inputs correct file names and existing column, consider groupby run and not a double for loop on the same column. Code is wrapped in try/except in case user enters an incorrect column name or some issue with exporting data fame to Excel.

from openpyxl import load_workbook

...

colpick = input("enter column to be splitted: ")        
colpick = colpick.title().strip()

def sendtoexcel():
    try:
        with pd.ExcelWriter(file, engine='openpyxl') as writer:
            writer.book = load_workbook(file)

            for i, sub in df.groupby([colpick]):             
                sub.to_excel(writer, sheet_name=i, index=False)

            writer.save()

    except Exception as e:
        print(e)

# ACTUALLY RUN FUNCTION
sendtoexcel()

edited May 04 '20 at 20:20

answered May 02 '20 at 14:36

Parfait

104,375
17
94
125

Thanks @Parfait, worked perfect. I need to 2 enhnacemnts to it now 1. Can i define the range to start the pasting. I need the pasting of output to start from cell D36 instead of present A1 2. Can i Paste the output in the same workbook as source – Biplab1985 May 02 '20 at 15:46
Sounds good! Glad to help. For 1, Look into the arguments of [**`to_excel`**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html). For 2, you should be able to export to existing Excel file. Try using original source file and not *newfile* in `ExcelWriter`. – Parfait May 02 '20 at 16:30
Could you help on the 2nd requirement. I tried to change but it does not work. i need to add the data to the source file itself from sheet2 as sheet1 will hold the original data – Biplab1985 May 04 '20 at 18:15
What does *it does not work* mean? Errors? No data is exported when you use original *file* instead of *newfile*? – Parfait May 04 '20 at 18:18
the code works fine when i am creating a whole new file. but when i trying to save the output in the same file as source(``file```) here. i get an error – Biplab1985 May 04 '20 at 18:34
And do tell what error is that? I assume all you are calling is: `writer = pd.ExcelWriter(file, engine='openpyxl')`. – Parfait May 04 '20 at 18:58
Sorry by error i meant incorrect output.Yes, i am calling `writer = pd.ExcelWriter(file, engine='openpyxl') `. I get a new file with the source data ONLY and the Split of customers do not work – Biplab1985 May 04 '20 at 19:16
New file? I assumed *file* was original workbook. In testing, I actually faced opposite problem where `groupby` split DOES work but overwrites previous workbook. To resolve, I used `openpyxl. load_workbook` from this [solution](https://stackoverflow.com/a/42637697/1422451). Now all previous sheets AND new splits sheets render. See edit. – Parfait May 04 '20 at 20:23

Split a dataframe into multiple data set based on condition and each sub set into Excel

1 Answers1