0

I have an excel file which I am using pd.read_excel() to read, inside the excel file are couple of date columns (the date data type is a string and must follow this format: dd/mm/yyyy. The problem I have is that when the excel file gets converted to a dataframe using pd.read_excel(), the values gets converted into an integer.

Does anyone know how I can maintain the value in the excel file after it has been converted to a dataframe.

Screenshot below: The columns with the date format enter image description here

What the values get converted to after converting the file to a dataframe enter image description here

"43800" is what the value of "Incorporation Date" got converted to.

What I have tried:

for column in columns_with_date_string:
    client_entity_df[column] = pd.to_datetime(
        client_entity_df[column].astype(int)
    )
    client_entity_df[column] = client_entity_df[column].dt.strftime('%d/%m/%Y')

This approach returned the values as "01/01/1970", instead of the dates specified

TLDR:

I basically want to maintain the value of my date columns (12/11/2022) in my excel file where the format is "dd/mm/yyy" when the excel file gets converted to a dataframe, pandas currently changes the values to an integer (which I assume is an epoch) when it converts the file to an integer.

Babayega
  • 33
  • 6
  • It's not Pandas converting the date, it's Excel itself. That's how it stores dates internally. – Mark Ransom Dec 16 '22 at 17:48
  • I see, so I guess it was that internal representation that Pandas was picking up? – Babayega Dec 16 '22 at 17:51
  • Yes. You can see it in Excel by changing the cell format from date to numeric. It's the number of days since [1899-12-30](https://stackoverflow.com/q/3963617/5987). – Mark Ransom Dec 16 '22 at 18:48

2 Answers2

0

You can use:

df_excel = pd.read_excel(file, dtype=object)

All columns became a object type and after that you can convert in another type if you need:

pd.to_datetime(df_excel['column_name'])
Inq
  • 43
  • 5
0

So I got a solution to it:

for column in columns_with_date_string:
    df[column] = df[column].apply(
        lambda x: datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2)
    )
    df[column] = df[column].dt.strftime('%m/%d/%Y')

This SO answer was helpful.

Babayega
  • 33
  • 6