Export pandas dataframe to xlsx: dealing with the openpyxl issue on python 3.9

Question

Using the latest packages version: openpyxl: 3.0.6 | pandas: 1.2.3 |python: 3.9

The function below was working fine before updating the packages above to the latest version reported.

Now it raises the error: "zipfile.BadZipFile: File is not a zip file".

Such function is really useful and would be great to know if it can be fixed in order to work.

The function below can be run as it is, just replace "pathExport" to your export directory for testing.

    def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
    
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
    
        Returns: None
    
        (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
        """
        from openpyxl import load_workbook
    
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
    
        writer = pd.ExcelWriter(filename, engine='openpyxl')
    
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
    
    
        try:
            # try to open an existing workbook
            writer.book = load_workbook(filename)
            
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
    
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
            
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            pass
    
        if startrow is None:
            startrow = 0
    
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
    
        # save the workbook
        writer.save()

pathExport = r"F:\PYTHON\NB-Suite_python39\MNE\outputData\df.xlsx"
df1 = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue']
                    })

append_df_to_excel(pathExport, df1, sheet_name="DF1", index=False, startcol=0, startrow=0)

This looks like a duplicate of #66471466 and I think it's because Pandas overwriting the existing workbook with an empty file. Just use a different name for the file after processing. — Charlie Clark, Mar 08 '21 at 15:23
@CharlieClark could I ask you kindly where should I change the name in the function? So I can give it a try and let users here if it works, thanks in advance — Baggio, Mar 08 '21 at 18:54
Instead of just `filename` use `source_filename` with openpyxl and `target_filename` with Pandas. Note, the two must really be different. — Charlie Clark, Mar 09 '21 at 10:55
@CharlieClark thank you. I did my best, but I'm not sure to having understood where should I make exactly those changes. Could you please reply with the code already fix it? So I could test it and confirm if it works or not, I'm sure it will be useful to all other users that will read here, thank you — Baggio, Mar 09 '21 at 11:42
quick update: just noted that exactly same issue was raised by "leeprevost" on the post #20219254 , just under the answer of @MaxU , but it was still not answered — Baggio, Mar 11 '21 at 10:40

buran · Answer 1 · 2021-03-12T12:59:10.723

OK, I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5 In pandas 1.2.0 they did some changes

At the time when you instantiate pd.ExcelWriter with

writer = pd.ExcelWriter(filename, engine='openpyxl')`

it creates empty file with size 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works fine with pandas 1.1.5.

The solution - specify mode='a', change the above line to

writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

Alternatively - look at this or this solution where it loads the file before instantiating the pd.ExcelWriter.

EDIT: I've been advised in the comments that with mode='a' it will raise FileNotFoundError in case the file does not exists. Although it's unexpected that it will not create the file in this case, the solution is to move creating the writer inside the existing try block and create a writer with mode w in the except part:

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
     
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
     
        Returns: None
     
        (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
        """
        from openpyxl import load_workbook
     
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
     
        
     
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
     
     
        try:
            writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
            # try to open an existing workbook
            writer.book = load_workbook(filename)
             
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
     
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
             
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            writer = pd.ExcelWriter(filename, engine='openpyxl')
     
        if startrow is None:
            startrow = 0
     
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
     
        # save the workbook
        writer.save()

thanks @buran . Your solution was working only in case you had already a df.xlsx file, but in case there is no already a file, it raised an error, because it could not append to a file that doesn't exist. However your idea helped me to think a workaround to fix that issue, that I integrated in the function as follows: try: f = open(filename) # Do something with the file except IOError: # print("File not accessible") wb = Workbook() ws = wb.active ws.title = sheet_name wb.save(filename) Thank you so much therefore — Baggio, Mar 12 '21 at 12:35
@Baggio, in this case simply move the line inside the existing `try` block and in the `except` block create a `writer` with `mode='w'` (i.e. the default one). I edited my answer accordingly. — buran, Mar 12 '21 at 13:00

score 2 · Accepted Answer · answered Mar 12 '21 at 12:06

The solution is the following:

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

Example Usage:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:\DataScience\df.xlsx" 

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCEL FILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

score 0 · Answer 3 · edited Aug 12 '23 at 00:37

0

First use xlsxwriter to write the first sheet.

Then use openpyxl to add new sheets.

There will be no error.

edited Aug 12 '23 at 00:37

toyota Supra

3,181
4
15
19

answered Aug 11 '23 at 21:56

Swapnanil

1

score 0 · Answer 4 · edited Aug 31 '23 at 08:33

# First create a workbook (empty)
wb = Workbook()
            
# Remove the default "Sheet"
default_sheet = wb['Sheet']<br/>
wb.remove(default_sheet)
            
# Write "sample" sheet - this is a sheet loaded from another excel file 
new_sheet = excel_file.parse("sample")<br/>
writer = pd.ExcelWriter(output_file_path, engine='openpyxl')<br/>
writer.book = wb<br/>
new_sheet.to_excel(writer, sheet_name="sample", index=False)
            
# Save and close the new Excel file
writer.save()

Export pandas dataframe to xlsx: dealing with the openpyxl issue on python 3.9

4 Answers4

Linked