Pandas datetime values messed up after saving df to excel and then reading back into a df

Question

jan_21=[datetime(2021,1,1) + timedelta(hours=i) for i in range(5)]


jan_21
    
[datetime.datetime(2021, 1, 1, 0, 0),
 datetime.datetime(2021, 1, 1, 1, 0),
 datetime.datetime(2021, 1, 1, 2, 0),
 datetime.datetime(2021, 1, 1, 3, 0),
 datetime.datetime(2021, 1, 1, 4, 0)]

prices = np.random.randint(1,100,size=(5,))

prices

[46 23 13 26 52]

df = pd.DataFrame({'datetime':jan_21, 'price':prices})

df

             datetime  price
0 2021-01-01 00:00:00     83
1 2021-01-01 01:00:00     60
2 2021-01-01 02:00:00     29
3 2021-01-01 03:00:00     97
4 2021-01-01 04:00:00     67

All good so far, this is how I expected the dataframe and datetime values to be displayed. The problem comes when I save the dataframe to an excel file and then read it back into a dataframe, the datetime values get messed up.

df.to_excel('price_data.xlsx', index=False)

new_df = pd.read_excel('price_data.xlsx')

new_df

                      datetime  price
0   2021-01-01 00:00:00.000000  83
1   2021-01-01 00:59:59.999999  60
2   2021-01-01 02:00:00.000001  29
3   2021-01-01 03:00:00.000000  97
4   2021-01-01 03:59:59.999999  67

I'd like df == new_df to evaluate to True

score 2 · Accepted Answer · answered Jun 22 '21 at 10:55

2

Against the backdrop of the likely cause of the issue (see sophros' answer), what you could do to - superficiously - circumvent the problem is converting the cells of df["datetime"] to strings before producing the excel file and then converting the strings to datetime again, after new_df has been created:

df["datetime"] = df["datetime"].dt.strftime("%m/%d/%Y, %H:%M:%S")
df.to_excel('price_data.xlsx', index=False)

new_df = pd.read_excel('price_data.xlsx')
new_df["datetime"] = pd.to_datetime(new_df["datetime"], format="%m/%d/%Y, %H:%M:%S")

answered Jun 22 '21 at 10:55

Georgy Kopshteyn

678
3
13

Yes, this is exactly what i just tried and it works! – Pruthvi Amin Jun 22 '21 at 11:00
1

@PruthviAmin Glad it works! If that is the case, you should accept the answer, though. – Georgy Kopshteyn Jun 23 '21 at 16:24

score 0 · Answer 2 · answered Jun 22 '21 at 10:44

The reason for the difference in time part of 00:59:59.999999 and 02:00:00.000001 and 03:59:59.999999 is most likely related to a slightly different binary representation of date/time types in Excel and Python or pandas.

The time is frequently stored as a float but the difference is when is the 0-th time (e.g. year 1 AC or 1970 - as in Linux; good explanation here). Therefore, the conversion may loose some least significant parts of the date/time and there is not much you can do about it but round it up or using approximate comparisons as with any float.

Pandas datetime values messed up after saving df to excel and then reading back into a df

2 Answers2