An .xlsx workbook is a zipped archive of xml files. According to the API reference the xml file for a worksheet can contain a 'worksheet properties' element (<sheetPr>
) with an attribute syncVertical
(without an h
). However opening a workbook with syncVertical
in Excel causes an error, while synchVertical
works fine. Other software seems to have followed Excel in creating workbooks with the 'wrong' spelling, while openpyxl
only accepts syncVertical
as per the specs.
Hopefully openpyxl will follow other software in accepting the misspelt attribute. In the meantime a fix is to remove the attribute. This can be done manually by opening the workbook in Excel and saving it again, which seems to remove the attribute. Alternatively, we can adapt this answer to edit the zip archive. Unfortunately it is a bit slow as it has to read and write the whole archive just to remove this one attribute. As a hacky quick solution we use a simple find/replace to take out the unwanted property. A better but slower solution would be to parse the xml files properly.
import tempfile
from zipfile import ZipFile
import shutil
import os
from fnmatch import fnmatch
def change_in_zip(file_name, name_filter, change):
tempdir = tempfile.mkdtemp()
try:
tempname = os.path.join(tempdir, 'new.zip')
with ZipFile(file_name, 'r') as r, ZipFile(tempname, 'w') as w:
for item in r.infolist():
data = r.read(item.filename)
if fnmatch(item.filename, name_filter):
data = change(data)
w.writestr(item, data)
shutil.move(tempname, file_name)
finally:
shutil.rmtree(tempdir)
change_in_zip("opto_data.xlsx",
name_filter='xl/worksheets/*.xml', # the problematic property is found in the worksheet xml files
change=lambda d: d.replace(b' synchVertical="1"', b' '))