0

I'm using python and pandas to query a table in SQL, store it in a DataFrame, then write it to an excel file (.xlsx).

I'm then using a couple of VBA macros to loop through the columns and do some conditional formatting to highlight outliers.

Everything works fine except the date column which excel gets stuck on and presents an error:

"Method 'Average' of object 'WorksheetFunction' failed"

The date is being stored as a string in the format '20-01-2022' which is presumably causing the error so I need to convert it to an actual datetime format that excel will recognise upon opening the file.

Example:

import pandas as pd

df = pd.DataFrame([[1, '21-06-2022'], [2, '19-08-2022'], [3, '06-04-2022']], columns=['id', 'date'])

df.to_excel("output.xlsx")

If you then open "output.xlsx" and try to use conditional formatting on the 'date' column, or try to =AVERAGE(C2:C4) either nothing happens or you get an error. If you double click into the cell, something happens and excel will suddenly recognise it, but this solution isn't suitable with thousands of cells.

How can I convert dates to a format that excel will recognise immediately upon opening the file?

S7ewie
  • 31
  • 1
  • 1
  • 6
  • Please, check [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/4046632). Then post [mre]. The sql part is irrelevant as long as you create sample DF with proper column types. Most likely your date column in DF is string, not datetime object – buran Jan 25 '23 at 16:58
  • Also make sure that there is indeed problem with date in excel and not some other problem with your VBA code – buran Jan 25 '23 at 16:59

1 Answers1

1

Before saving your df to excel, you need to parse those ISO8601 string to dates.

There are several ways to do that.

You can use the pandas.read_sql keyword argument parse_dates to parse specific columns as dates, even specifying the format, which can parse as dates directly.

import pandas as pd

df = pd.read_sql(
    sql,
    con,
    parse_dates={
        "<col1>": {"format": "%y-%m-%d"},
        "<col2>": {"format": "%d/%m/%y"}
    },
)

Same as above, but without a format, parses columns as datetimes and then the dates can be extracted.

import pandas as pd

df = pd.read_sql(sql, con, parse_dates=["<col1>", "<col2>"])
df[["<col1>", "<col2>"]] = df[["<col1>", "<col2>"]].dt.date

You can load then parse manually with pd.to_datetime, and again extract the dates only.

import pandas as pd

df = pd.read_sql(sql, con)
df[["<col1>", "<col2>"]] = pd.to_datetime(df[["<col1>", "<col2>"]]).dt.date

Or you could also just parse with datetime.date.fromisoformat.

import pandas as pd
from datetime import date

df = pd.read_sql(sql, con)
df[["<col1>", "<col2>"]] = df[["<col1>", "<col2>"]].applymap(date.fromisoformat)

NB. no specific ordering was used, but it seems the first method is slightly faster than the others, while also being the most elegant (in my opinion).

ljmc
  • 4,830
  • 2
  • 7
  • 26
  • Thank you! Adding `parse_dates=["date"]` to the `pd.read_sql()` worked for me. Do you know if its possible to format a datetime to display as `"day-month-year"` WITHOUT converting it back to a string? I imagine that's something I'll have to do in excel as its excel that decides how to display it in its own GUI? – S7ewie Jan 26 '23 at 09:52
  • Exactly, that will be an excel formatting issue, you can probably do it in python via openpyxl, but I'm not familiar with a way to do it right if `df.to_excel`. – ljmc Jan 26 '23 at 10:08