There are a couple of other threads about similar issues, but none of them have had solutions that work for me. I'm running python in a Docker in an AWS EC2 instance, and sending the files to an output folder that sends them to slack. This works fine for CSV files, images, text, and HTML. The Excel document is generated, but can't be opened because of this error: "this file is corrupt and cannot be opened".
I've been testing this on a dummy dataframe that hasn't got any complicated information, and I'm still getting this problem. I've tried the following:
This is the most simple version, still causing the issue:
df_test.to_excel(os.path.join(pathtest, filename))
Basically using the to_excel function in its most basic state with a dataframe. The output exists, but the error occurs on opening the excel file. There is no error when running the code.
df = pd.util.testing.makeDataFrame()
pathtest = ('./output')
filename = "testoutput.xlsx"
Below is an attempted summary of everything I've tried. Apologies, it's a lot.
I've tried this with header and index both true and false. I've also tried setting the engine and writer.
writer = pd.ExcelWriter(path, engine='xlsxwriter')
result_df1.to_excel(writer, sheet_name='results')
with pd.ExcelWriter(path, engine='xlsxwriter', engine_kwargs={'options': {'strings_to_urls': False, 'strings_to_formulas': False}}) as writer:
result_df1.to_excel(writer, sheet_name="Sheet1")
df1.to_excel(writer, sheet_name="Sheet2")
df2.to_excel(writer, sheet_name="Sheet3")
I've also tried this with several different context managers; I've tried using append mode and openpyxl.
I've also tried removing styles from pandas, since I read in a different thread/github issue that header styles can cause conflicts.
pd.io.formats.style.Styler.format.header_style = None
I've tried using several different formations of dataframe_to_rows:
from openpyxl import Workbook
from openpyxl.cell.cell import WriteOnlyCell
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook(write_only=True)
ws1 = wb.create_sheet()
ws1.title = "results"
cell = WriteOnlyCell(ws1)
def format_first_row(row, cell):
for c in row:
cell.value = c
yield cell
rows = dataframe_to_rows(result_df)
first_row = format_first_row(next(rows), cell)
ws1.append(first_row)
for row in rows:
row = list(row)
cell.value = row[0]
row[0] = cell
ws1.append(row)
I've also tried every combination of writer.save() and writer.close() I could think of, including not including either and including both, having the save lead to a file and not, etc. I've tried using write only and append. I've tried importing and using xlsxwriter and openpyxl together and separately. I've tried setting the active sheet and not setting it. (I have 133 different combinations-- I'm trying to be as thorough as possible here, but I may have missed some things I've tried just because I've been banging my head against this issue for so long.)
I've literally copied the most basic code samples from the read the docs for to_excel and openpyxl and had the same issue.
The file opens without problems locally, but not in the Docker instance.
- pandas~=1.4.3
- xlsxwriter==1.2.8
- openpyxl==3.0.10
path = ('./output')
filename = "testoutput.xlsx"
Sources I've looked at code from include:
- https://openpyxl.readthedocs.io/en/latest/usage.html?highlight=keep_vba#write-a-workbook-from-xltm-as-xlsm
- https://www.reddit.com/r/learnpython/comments/mbmf3w/excel_file_getting_corrupted_when_saving_from/
- https://techinplanet.com/excel-file-corrupt-or-wrong-extension-error-openpyxl-writerxlsx/
- openpyxl Set Active Sheet
- pandas.DataFrame.to_excel() corrupting file <- seems to be a similar issue to this person's problem.
- pandas to_excel export corrupt file in docker?
The file doesn't include any macros, or any fancy formatting: it's literally just a test dataframe with text and numbers.