4

What I was trying to do is appening dataframe data to an existing legit excel file. I used load_workbook() function from openpyxl, but it systematically returns an error. Here is some code that crashes on my machine:

from openpyxl import load_workbook

report_path = root_folder + '\\log_report.xlsx'
writer = pd.ExcelWriter(report_path, engine='openpyxl')
writer.book = load_workbook(report_path)
writer.close()

Here, log_report.xlsx already exists and have been created through pandas .to_excel(). Before opening with openpyxl load_workbook(), it is possible to open it, to edit it and do anything MS Excel allows. I got the following error returned:

Traceback (most recent call last):
  File "D:/failsafe_counter/main.py", line 419, in <module>
    writer.book = load_workbook(report_path)
  File "D:\failsafe_counter\venv\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "D:\failsafe_counter\venv\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "D:\failsafe_counter\venv\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38-32\lib\zipfile.py", line 1269, in __init__
    self._RealGetContents()
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38-32\lib\zipfile.py", line 1336, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Important aspect of that is this operation left the initial excel file corrupted, and impossible to open again.

Initial file is a legit zipfile (verified through renaming the xlsx into a .zip). Once the little code above returns the error, it turns the Excel file into an empty archive (verified through the same process).

I employed such functions with success on my previous laptop (under windows 7) but since I migrated under windows 10 I'm not able to use them anymore. Both of them were running python 3.8.

Is there any known issue about openpyxl load_workbook() on some configs? Do you have any idea how to fix this, or any workaround?

EGI
  • 117
  • 1
  • 1
  • 8

3 Answers3

9

I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5 In pandas 1.2.0 they did some changes

At the time when you instantiate pd.ExcelWriter with

writer = pd.ExcelWriter(report_path, engine='openpyxl')`

it creates empty file with size 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works fine with pandas 1.1.5.

The solution - specify mode='a', change the above line to

writer = pd.ExcelWriter(report_path, engine='openpyxl', mode='a')

Alternatively - look at @CharlieClark solution or this solution where they loads the file before instantiating the pd.ExcelWriter..

buran
  • 13,682
  • 10
  • 36
  • 61
  • I had exact same issue, moving from old laptop to new laptop.. same OS (Windows 10) but different Pandas package version. The "mode = 'a'" solved the issue. Thank you!! – acakaliman Jul 13 '22 at 23:35
0

The exception is quite clear: openpyxl cannot read the file because it is not a zipfile.

pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it. If you want to work with a file in both Pandas an openpyxl, you have to create a "book" object.

wb = load_workbook(report_path)
writer = pd.ExcelWriter(report_path, engine='openpyxl')
writer.book = wb
Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • Issue is that it is a legit xlsx file and thus a legit archive. It can be opened without any issue in excel until I first try to access it through load_workbook(). It's like load_workbook itself corrupts the xlsx file. On top of that, same initial file accessed on my older machine through the same method didn't lead to any error. – EGI Mar 04 '21 at 10:24
  • Just because Excel can open it, does not mean it is a legitimate zipfile. For example, it might be encrypted. – Charlie Clark Mar 04 '21 at 11:47
  • You're right. So I renamed the excel file before opening it with openpyxl by adding a .zip and I've been able to open it. Once I run the little code on it and thus open it with openpyxl, it appears it became an empty zip. Indeed it seems that the load_workbook() function itself causes trouble on my setup for some reasons. I edited my initial post. – EGI Mar 04 '21 at 13:53
0

In my case that problem occour when file is internal in company domain (not public).

user_mpr
  • 11
  • 3