0

I have a Jupyter notebook that is pulling data from Athena into a Dataframe and then inserting that dataframe into an Excel Workbook using xlsxwriter

The dates in excel require the custom formatting [$-en-US,1]dd-mm-yy which is being applied but it doesn't appear visually in excel until the cell has been double clicked. eg. they appear as 2023-07-10 but as soon as you double click the cell the formatting is applied and it shows 10-07-23

The dates from the SQL query are all selected using: DateFormat(exampl_date, '%Y-%m-%d') and they appear in the dataframe with data type Object

and then I use this function to create the excel workbook:

def format_excel(dataframe, worksheet):
    ## Creates new sheet
    dataframe.to_excel(writer, sheet_name=worksheet, startrow=3, header=False, index=False)
    workbook = writer.book
    worksheet = writer.sheets[worksheet]
    
    ## Various cell formats to be used later
    default_format = workbook.add_format({
        'font_name': 'Arial',
        'font_size': '10'})
    
    heading_format = workbook.add_format({
        'text_wrap': True,
        'bold': True,
        'valign': 'top',
        'font_color': '#4682B4',
        'font_name': 'Courier New',
        'font_size': '12'})
    
    date_format = workbook.add_format({
        'num_format': '[$-en-US,1]dd-mm-yy'})
    
    ##Applies default format to all cells and sizes columns
    worksheet.set_column('A:B', 23, default_format)
    worksheet.set_column('C:C', 24, default_format)
    worksheet.set_column('D:D', 28, default_format)
    worksheet.set_column('E:M', 19, default_format)
    
    ##Sizes the first three rows
    worksheet.set_row(0, 30)
    worksheet.set_row(1,  1)
    worksheet.set_row(2, 25)
    
    ## adds the heading row
    run_day_formatted = run_day.strftime("%d/%m/%Y")
    worksheet.merge_range('A1:F1', 'sometext ' + run_day_formatted + ' and '  + run_day_formatted, heading_format)
    
    ##Adds the date formatting
    worksheet.set_column('F:M', 19, date_format)
    
    ## adds and formats the column names
    header_format = workbook.add_format({
        'bold': True,
        'text_wrap': True,
        'valign': 'top',
        'fg_color': '#C0C0C0'})

    for col_num, value in enumerate(dataframe.columns.values):
        worksheet.write(2, col_num, value, header_format)

# Open XLSX Writer and create workbook
writer = pd.ExcelWriter(file_name(), engine='xlsxwriter')

# Uses above function to format the excel sheet
format_excel(df_results, 'some name')

# Closes writer and generates excel
writer.close()
print('Excel created')

I think the solution is adding a step that converts the date fields in the dataframe to a date datatype of some description but nothing seems to be working

Edit:

date_columns = [The Date Columns]

for column in date_columns:
    df_results[column] = pd.to_datetime(df_results[column], format='%Y-%m-%d')

This causes the date value in excel to be 2023-07-05 00:00:00 instead of the intended 2023-07-05 and the custom formatting from xlsx writer is completely overwitten by yyyy-mm-dd hh:mm:ss instead of the intended [$-en-US,1]dd-mm-yy

Edit 2: This is what the excel is intended to look like: 1

Notice the value in the bar and the cell are different and the custom format string matches the one from xlsx writer.

I'm trying to get this file to reconcile exactly with an old one which is why i'm trying to recreate this instead of just putting the date i want in the field.

  • Use `pd.to_datetime`. – BigBen Jul 10 '23 at 15:50
  • I don't think this works... I've updated the main post – Joe Morris Jul 10 '23 at 15:59
  • 1
    In Excel dates are stored as *decimals* (OADates) that are displayed as dates based on the cell's style. There's no stored format, only the cell style. There's no `[$-en-US,1]dd-mm-yy` style though and that *DEFINITELY* isn't a US-format. US dates have the month in front – Panagiotis Kanavos Jul 10 '23 at 16:12
  • 1
    Trying to force a "format" by changing the text will only result in incorrect date values. Just save *dates* instead and then use one of the predefined date styles to see how things actually look in Excel. – Panagiotis Kanavos Jul 10 '23 at 16:16
  • 1
    `as you double click the cell the formatting is applied and it shows 10-07-23` I suspect the opposite happens: Excel detects the invalid format string and clears it or fixes it – Panagiotis Kanavos Jul 10 '23 at 16:19
  • I know its not ideal but for better or worse i need this file to reconcile exactly with an older one, i've added an image of the source excel. – Joe Morris Jul 10 '23 at 16:24

1 Answers1

2

The issue here is that Pandas sets a cell format for datetimes and that overrides the column format. Instead you can set the default date format pd.ExcelWriter() parameters:

import pandas as pd
from datetime import datetime, date

# Create a Pandas dataframe from some datetime data.
df = pd.DataFrame(
    {
        "Date and time": [
            datetime(2023, 7, 11, 11, 30, 55),
            datetime(2023, 7, 12, 1, 20, 33),
            datetime(2023, 7, 13, 11, 10),
            datetime(2023, 7, 14, 16, 45, 35),
            datetime(2023, 7, 15, 12, 10, 15),
        ],
        "Dates only": [
            date(2023, 7, 11),
            date(2023, 7, 12),
            date(2023, 7, 13),
            date(2023, 7, 14),
            date(2023, 7, 15),
        ],
    }
)


# Create a Pandas Excel writer using XlsxWriter as the engine.
# Also set the default datetime and date formats.
writer = pd.ExcelWriter(
    "pandas_datetime.xlsx",
    engine="xlsxwriter",
    datetime_format="[$-en-US,1]dd-mm-yy",
    date_format="[$-en-US,1]dd-mm-yy",
)

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name="Sheet1")

# Get the xlsxwriter workbook and worksheet objects in order
# to set the column widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets["Sheet1"]

# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape

# Set the column widths, to make the dates clearer.
worksheet.set_column(1, max_col, 20)

# Close the Pandas Excel writer and output the Excel file.
writer.close()

Output:

enter image description here

jmcnamara
  • 38,196
  • 6
  • 90
  • 108