Pandas to_excel function results in corrupt excel file in docker?

Question

There are a couple of other threads about similar issues, but none of them have had solutions that work for me. I'm running python in a Docker in an AWS EC2 instance, and sending the files to an output folder that sends them to slack. This works fine for CSV files, images, text, and HTML. The Excel document is generated, but can't be opened because of this error: "this file is corrupt and cannot be opened".

I've been testing this on a dummy dataframe that hasn't got any complicated information, and I'm still getting this problem. I've tried the following:

This is the most simple version, still causing the issue:

df_test.to_excel(os.path.join(pathtest, filename))

Basically using the to_excel function in its most basic state with a dataframe. The output exists, but the error occurs on opening the excel file. There is no error when running the code.

df = pd.util.testing.makeDataFrame()

pathtest = ('./output')

filename = "testoutput.xlsx"

Below is an attempted summary of everything I've tried. Apologies, it's a lot.

I've tried this with header and index both true and false. I've also tried setting the engine and writer.

writer = pd.ExcelWriter(path, engine='xlsxwriter')

result_df1.to_excel(writer, sheet_name='results')

with pd.ExcelWriter(path, engine='xlsxwriter', engine_kwargs={'options': {'strings_to_urls': False, 'strings_to_formulas': False}}) as writer:
    result_df1.to_excel(writer, sheet_name="Sheet1")
    df1.to_excel(writer, sheet_name="Sheet2")
    df2.to_excel(writer, sheet_name="Sheet3")

I've also tried this with several different context managers; I've tried using append mode and openpyxl.

I've also tried removing styles from pandas, since I read in a different thread/github issue that header styles can cause conflicts.

pd.io.formats.style.Styler.format.header_style = None

I've tried using several different formations of dataframe_to_rows:

from openpyxl import Workbook
from openpyxl.cell.cell import WriteOnlyCell
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook(write_only=True)
ws1 = wb.create_sheet()
ws1.title = "results"
cell = WriteOnlyCell(ws1)
def format_first_row(row, cell):
    for c in row:
        cell.value = c
        yield cell
rows = dataframe_to_rows(result_df)
first_row = format_first_row(next(rows), cell)
ws1.append(first_row)
for row in rows:
    row = list(row)
    cell.value = row[0]
    row[0] = cell
    ws1.append(row)

I've also tried every combination of writer.save() and writer.close() I could think of, including not including either and including both, having the save lead to a file and not, etc. I've tried using write only and append. I've tried importing and using xlsxwriter and openpyxl together and separately. I've tried setting the active sheet and not setting it. (I have 133 different combinations-- I'm trying to be as thorough as possible here, but I may have missed some things I've tried just because I've been banging my head against this issue for so long.)

I've literally copied the most basic code samples from the read the docs for to_excel and openpyxl and had the same issue.

The file opens without problems locally, but not in the Docker instance.

pandas~=1.4.3
xlsxwriter==1.2.8
openpyxl==3.0.10

path = ('./output') filename = "testoutput.xlsx"

Sources I've looked at code from include:

The file doesn't include any macros, or any fancy formatting: it's literally just a test dataframe with text and numbers.

There is a lot going on here. Could you reduce it down to a simple working example that demonstrates the issue. — jmcnamara, Jul 28 '22 at 22:16
I don't see that this `pd.io.formats.style.Styler.format.header_style = None` would do anything since it's never referenced in the pandas code. `pd.io.formats.excel.ExcelFormatter.header_style` is a valid and used pandas property though. — Attack68, Jul 29 '22 at 07:43
@jmcnamara: I've tried to clarify-- the most basic working example is the first simple "to_excel" example. Let me know if this clarifies the ask for you. — Jess, Jul 29 '22 at 16:11
@Attack68 - totally makes sense. I was going off this https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_excel.html . I've tried with the formats.excel.ExcelFormatter.header_style = none now and that still hasn't fixed anything. — Jess, Jul 29 '22 at 16:21
I just saw an issue on Pandas regarding different environments (https://github.com/pandas-dev/pandas/issues/47871) are you using Python 3.10? Have you tried 3.9? — Attack68, Jul 29 '22 at 21:17

Pandas to_excel function results in corrupt excel file in docker?

0 Answers0