I am trying to write a series of pandas DataFrames to an Excel worksheet such that:
- The existing contents of the worksheet are not overwritten or erased, and
- the Excel column widths are adjusted to fit the lengths of the column entries (so that I don't have to manually do this in Excel).
For 1), I have found an excellent solution in the form of a helper function written by @MaxU: How to write to an existing excel file without overwriting data (using pandas)?. For 2) I found what looked like a good solution here. But when I try to put these solutions together, the column widths don't change at all. Here's my full code:
import pandas as pd
import os
from openpyxl import load_workbook
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
@param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
@param df: DataFrame to save to workbook
@param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
@param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
@param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
@param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
@return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df)
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
"""
Now attempt to adjust the column widths as necessary so that all the cell contents are visible
in Excel. The code below is taken from https://towardsdatascience.com/how-to-auto-adjust-the-width-of-excel-columns-with-pandas-excelwriter-60cee36e175e.
"""
for column in df:
column_width = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)
writer.save()
Now I tried testing the function:
df = pd.DataFrame({'A_Very_Long_Column_Name': [10, 20, 30, 20, 15, 30, 45]})
append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df, "Sheet1")
A new Excel workbook named test.xlsx is created along with a sheet named Sheet1, and the contents of df
are written to Sheet1, but the column widths are completely unaffected:
And strangely, when I try to execute the function a second time (without changing the arguments), I get an error:
runcell(2, 'C:/Users/Leonidas/Documents/write_to_excel2.py')
Traceback (most recent call last):
File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 125, in <module>
append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df,
File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 100, in append_df_to_excel
writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)
AttributeError: 'Worksheet' object has no attribute 'set_column'
I'm pretty confused at this point...Any suggestions for how to fix the code would be greatly appreciated.