read xlsb file as pandas dataframe and parse the date column as datetime format

Question

I have a 'some.xlsb' file with some 10 columns, out of which 2 are DateTime column.

When I load using pandas the date-time column is parsed in a different form.

Explanations:

where DateTime value corresponding to 4/10/2021 11:50:24 AM - read as 44296.5

Below is the code I tried.

goods_df = pd.read_excel('some.xlsb',
                   engine='pyxlsb', sheet_name='goods_df')

goods_df_header = goods_df.iloc[1]
goods_df.columns = goods_df_header #set the header row as the df header
goods_df= goods_df[2:]
goods_df.head(2)

you might want to do the parsing to datetime after import, see e.g. [Convert Excel style date with pandas](https://stackoverflow.com/q/38454403/10197418). — FObersteiner, Jun 17 '21 at 08:34

score 2 · Answer 1 · edited Dec 16 '21 at 06:59

2

When you read xlsb file using pandas you will get excel time float value because xlsb convert datetime object into an float value before storing.

According to Microsoft 44296.5 means 44296.5 days passed since jan 1st 1900.

You need convert this into epoch and then date by using below formula( epoch value= number of sec passed since jan 1st 1970 00:00:00 ).

a = datetime.datetime.strftime((int(<datevalue from excel>)*86400)-2207520000, "%m/%d/%Y")

Or you can save this xlsb as xlsx and read it you will get exact datetime object.

edited Dec 16 '21 at 06:59

ZygD

22,092
39
79
102

answered Dec 16 '21 at 06:58

Aditya Sai

21
2

note, that this is not exact, because `2207520000` corresponds to 70 years in seconds assuming 365 days per year (which is not correct). The exact number one can find in https://stackoverflow.com/a/34721924/9640384 – Mischa Lisovyi Apr 20 '22 at 06:54

read xlsb file as pandas dataframe and parse the date column as datetime format

1 Answers1