29

I have to parse an xml file which gives me datetimes in Excel style; for example: 42580.3333333333.

Does Pandas provide a way to convert that number into a regular datetime object?

jpp
  • 159,742
  • 34
  • 281
  • 339
  • see also [How to convert a given ordinal number (from Excel) to a date](https://stackoverflow.com/q/29387137/10197418) – FObersteiner Apr 16 '21 at 06:16

4 Answers4

43

OK I think the easiest thing is to construct a TimedeltaIndex from the floats and add this to the scalar datetime for 1900,1,1:

In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df

Out[85]:
           date
0  42580.333333
1  10023.000000

In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df

Out[86]:
           date                  real_date
0  42580.333333 2016-07-31 07:59:59.971200
1  10023.000000 1927-06-12 00:00:00.000000

OK it seems that excel is a bit weird with it's dates thanks @ayhan:

In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df

Out[89]:
           date                  real_date
0  42580.333333 2016-07-29 07:59:59.971200
1  10023.000000 1927-06-10 00:00:00.000000

See related: How to convert a python datetime.datetime to excel serial date number

gosuto
  • 5,422
  • 6
  • 36
  • 57
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Thanks for great answer. Is there a way to change the date to format ddmmyyyy in the same command? Appreciate any help here.. Also, I tried on my dataset and noticed each date was incremented by 2 using the first code, second code gave correct dates though. – R.A Mar 07 '21 at 16:07
29

you can directly parse with pd.to_datetime, with keywords unit='D' and origin='1899-12-30':

import pandas as pd

df = pd.DataFrame({'xldate': [42580.3333333333]})

df['date'] = pd.to_datetime(df['xldate'], unit='D', origin='1899-12-30')

df['date']
Out[2]: 
0   2016-07-29 07:59:59.999971200
Name: date, dtype: datetime64[ns]

further reading:

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
15

You can use the 3rd party xlrd library before passing to pd.to_datetime:

import xlrd

def read_date(date):
    return xlrd.xldate.xldate_as_datetime(date, 0)

df = pd.DataFrame({'date':[42580.3333333333, 10023]})

df['new'] = pd.to_datetime(df['date'].apply(read_date), errors='coerce')

print(df)

           date                 new
0  42580.333333 2016-07-29 08:00:00
1  10023.000000 1927-06-10 00:00:00
jpp
  • 159,742
  • 34
  • 281
  • 339
-1

Another option to solve it with "apply lambda" for pd.Series:

xl_date = 42580

df['DATE SERIES'] = df['DATE SERIES'].apply(lambda x: datetime.fromordinal(datetime(1900, 1, 1).toordinal() + xldate - 2))