1

This code used to get a xlsx file and write over it, but after updating from pandas 1.1.5 to 1.5.1 I got zipfile.badzipfile file is not a zip file

Then I read here that after pandas 1.2.0 the pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it.

Knowing that, I changed the code to this one, but now I'm getting AttributeError: property 'sheets' of 'OpenpyxlWriter' object has no setter. How should I handle this?

book = load_workbook('Resultados.xlsx')
    writer = pd.ExcelWriter('Resultados.xlsx', engine='openpyxl')
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

    reader = pd.read_excel(r'Resultados.xlsx')

    df = pd.DataFrame.from_dict(dict_)
    df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)

    writer.close()
Moises Felipe
  • 83
  • 2
  • 9

3 Answers3

3

try this:

filepath = r'Resultados.xlsx'
with pd.ExcelWriter(
        filepath,
        engine='openpyxl',
        mode='a',
        if_sheet_exists='overlay') as writer:
    reader = pd.read_excel(filepath)
    df.to_excel(
        writer,
        startrow=reader.shape[0] + 1,
        index=False,
        header=False)
ziying35
  • 1,190
  • 3
  • 6
3

TLDR

  • Use .update to modify writer.sheets
  • Rearrange the order of your script to get it working
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")

book = load_workbook("Resultados.xlsx")

# use `with` to avoid other exceptions
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
    writer.book = book
    writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))

    df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)

Details

Recreating your problem with some fake data

import numpy as np
from openpyxl import load_workbook
import pandas as pd


if __name__ == "__main__":

    # make some random data
    np.random.seed(0)
    df = pd.DataFrame(np.random.random(size=(5, 5)))

    # this makes an existing file
    with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
        df.to_excel(excel_writer=writer)

    # make new random data
    np.random.seed(1)
    df = pd.DataFrame(np.random.random(size=(5, 5)))

    # what you tried...
    book = load_workbook("Resultados.xlsx")
    writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

    reader = pd.read_excel("Resultados.xlsx")

    # skipping this step as we defined `df` differently
    # df = pd.DataFrame.from_dict(dict_)
    df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)

    writer.close()

We get the same error plus a FutureWarning

...\StackOverflow\answer.py:23: FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version
  writer.book = book
Traceback (most recent call last):
  File "...\StackOverflow\answer.py", line 24, in <module>
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
AttributeError: can't set attribute 'sheets'

The AttributeError is because sheets is a property of the writer instance. If you're unfamiliar with it, here is a resource.

In shorter terms, the exception is raised because sheets cannot be modified in the way you're trying. However, you can do this:

# use the `.update` method
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))

That will move us past the the AttributeError, but we'll hit a ValueError a couple lines down:

reader = pd.read_excel("Resultados.xlsx")
Traceback (most recent call last):
  File "...\StackOverflow\answer.py", line 26, in <module>
    reader = pd.read_excel("Resultados.xlsx")
  ...
  File "...\lib\site-packages\pandas\io\excel\_base.py", line 1656, in __init__
    raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.

Do what the error message says and supply an argument to the engine parameter

reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")

And now we're back to your original zipfile.BadZipFile exception

Traceback (most recent call last):
  File "...\StackOverflow\answer.py", line 26, in <module>
    reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
  ...
  File "...\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

After a bit of toying, I noticed that the Resultados.xlsx file could not be opened manually after running this line:

writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")

excel error message

So I reordered some of the steps in your code:

    # run before initializing the ExcelWriter
    reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")

    book = load_workbook("Resultados.xlsx")

    # the old way
    # writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")

    with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
        writer.book = book
        writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))

        df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
Ian Thompson
  • 2,914
  • 2
  • 18
  • 31
0

What worked for me was to replace writer.book and writer.sheets with writer.workbook and writer.worksheets respectively.