-1

I want to loop through my data and save every component based data in separate sheet in the same excel workbook in S3 bucket.

Dataframe df looks as below:

enter image description here

Below is my code:

    today = datetime.datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
    components=["COMP1","COMP2","COMP3"]
    filename = 'auto_export_'+today
    
    for comp in components:
        df1= df[df['component']==comp]
        print(comp)
        print(df1)
        with io.BytesIO() as output:
            with pd.ExcelWriter(output, engine='xlsxwriter') as writer:
                df1.append_df_to_excel(writer, sheet_name=comp, index=False)
            data = output.getvalue()
        s3 = boto3.resource('s3')
        s3.Bucket('mybucket').put_object(Key='folder1/'+filename+'.xlsx', Body=data)

This is generating the excel file correctly but writing only COMP3 data into it. It is not writing COMP1 and COMP2 sheets. Any guidance on how to fix this problem?

Rick
  • 1,392
  • 1
  • 21
  • 52

2 Answers2

2

You are writing each component/dataframe as an excel file to S3, overwriting the previous file. You're doing everything inside the for loop, so you are writing out component 1, then completely obliterating all that work by overwriting it with component 2, then doing the same thing again by overwriting that with component 3.

You need to refactor your code like so:

today = datetime.datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
components=["COMP1","COMP2","COMP3"]
filename = 'auto_export_'+today


# First create an output
with io.BytesIO() as output:

    # Now create the Excel file using the output
    with pd.ExcelWriter(output, engine='xlsxwriter') as writer:

        # Iterate over each component, adding each as a sheet to the Excel file
        for comp in components:
            df1= df[df['component']==comp]
            print(comp)
            print(df1)
            
            # Add this component to the Excel file as a new sheet
            df1.append_df_to_excel(writer, sheet_name=comp, index=False)
            
        # After we have finished iterating over all components, get the entire Excel file   
        data = output.getvalue()
        
        # Copy this Excel file to S3
        s3 = boto3.resource('s3')
        s3.Bucket('mybucket').put_object(Key='folder1/'+filename+'.xlsx', Body=data)
Mark B
  • 183,023
  • 24
  • 297
  • 295
  • Thanks @mark. I get to see "[ERROR] AttributeError: 'DataFrame' object has no attribute 'append_df_to_excel'" – Rick Feb 12 '23 at 01:36
  • It looks like you are copy/pasting some code from something like this answer: https://stackoverflow.com/a/38075046/13070 without understanding that `append_df_to_excel` is not a built-in function, but something they added in their code. – Mark B Feb 12 '23 at 13:44
  • ok got it. I'm trying to understand the to_excel function because its producing 0B file for me. I'm a newbie and trying to understand your answer – Rick Feb 12 '23 at 13:57
  • ok I'm able to fix it by changing it to to_excel and added writer.save function. df1.to_excel(writer, sheet_name=comp, index=False) writer.save() – Rick Feb 12 '23 at 14:35
1

Using a list comp and .groupby:

from datetime import datetime
from io import BytesIO

import boto3
import pandas as pd


today = datetime.now().strftime("%m_%d_%Y_%H_%M_%S")
filename = f"auto_export_{today}"

with BytesIO() as output:
    with pd.ExcelWriter(output) as writer:
        for df in [x.reset_index(drop=True) for _, x in df.groupby("Component")]:
            sheet_name = df["Component"][0]
            df.to_excel(writer, sheet_name, index=False)

        data = output.getvalue()

        s3 = boto3.resource("s3")
        s3.Bucket("mybucket").put_object(Key=f"folder1/{filename}.xlsx", Body=data)
Jason Baker
  • 3,170
  • 2
  • 12
  • 15