0

I have the following datetime object:

import pandas as pd
from datetime import datetime
t0=datetime.strptime("01/01/2011 00:00:00", "%d/%m/%Y %H:%M:%S")

here, t0 is my reference or start time of simulation. I wanted to convert it into total hours (but failed) so that I can add them to my Hours df column and finally convert into a datetime column that could start from 2021-01-01.

I have a following Hours column which calculates hours from the start time t0:

My model results in hours: 

    Hours
0   44317.0
1   44317.250393519
2   44317.500138889
3   44317.750462963
4   44318.00005787
5   44318.250266204
6   44318.500543981
7   44318.7503125
8   44319.000520833
9   44319.250729167
10  44319.500428241

In excel if I convert this hours into date format it becomes 2021-05-01, like this which is my expected output:

My expected output:



Hours
1   5/1/21 0:00
2   5/1/21 6:00
3   5/1/21 12:00
4   5/1/21 18:00
5   5/2/21 0:00
6   5/2/21 6:00
7   5/2/21 12:00
8   5/2/21 18:00
9   5/3/21 0:00
10  5/3/21 0:00

However, in python if I can converted this Hours column into a datetime column named date using pd.to_datetime(df.Hours)` it starts from 1970-01-01.

My python output which I don't want:

    Hours
0   1970-01-01 00:00:00.000044317
1   1970-01-01 00:00:00.000044317
2   1970-01-01 00:00:00.000044317
3   1970-01-01 00:00:00.000044317
4   1970-01-01 00:00:00.000044318
5   1970-01-01 00:00:00.000044318
6   1970-01-01 00:00:00.000044318
7   1970-01-01 00:00:00.000044318
8   1970-01-01 00:00:00.000044319
9   1970-01-01 00:00:00.000044319
10  1970-01-01 00:00:00.000044319

Please let me know how to convert it so that it starts from 1st May, 2021.

Solution: From Michael S. answere below: The Hours column is actually not hours but days and using pd.to_datetime(df.Hours, unit='d',origin='1900-01-01') will give the right results. The software that I am using also uses excel like epoch time of '1900-01-01' and mistakenly says the days as hours.

ZVY545
  • 384
  • 1
  • 13

1 Answers1

2

Here is an update to the answer with OP's edits and inputs. Excel is weird with dates, so if you have to convert your timestamps (44317 etc) to Excel's dates, you have to do some odd additions to put the dates in line with Excel's (Pandas and Excel have different "Start of Time" dates, that's why you are seeing the different values e.g. 1970 vs 2021). Your 44317 etc numbers are actually days and you have to add 1899-12-30 to those days:

hours = [44317.0, 44317.250393519, 44317.500138889, 44317.750462963, 
         44318.00005787, 44318.250266204, 44318.500543981, 44318.7503125, 
         44319.000520833, 44319.250729167, 44319.500428241]
df = pd.DataFrame({"Hours":hours})

t0=datetime.strptime("01/01/2011 00:00:00", "%d/%m/%Y %H:%M:%S")
df["Actual Date"] = pd.TimedeltaIndex(df['Hours'], unit='d') + datetime(1899, 12, 30)
# Alternateive is   pd.to_datetime(df.Hours, unit='d', origin='1899-12-30')

Output:

Hours   Actual Date
0   44317.000000    2021-05-01 00:00:00.000000000
1   44317.250394    2021-05-01 06:00:34.000041600
2   44317.500139    2021-05-01 12:00:12.000009600
3   44317.750463    2021-05-01 18:00:40.000003200
4   44318.000058    2021-05-02 00:00:04.999968000
5   44318.250266    2021-05-02 06:00:23.000025600
6   44318.500544    2021-05-02 12:00:46.999958400
7   44318.750313    2021-05-02 18:00:27.000000000
8   44319.000521    2021-05-03 00:00:44.999971199
9   44319.250729    2021-05-03 06:01:03.000028799
10  44319.500428    2021-05-03 12:00:37.000022400

There are ways to clean up the format, but this is the correct time as you wanted.

To match your output exactly, you can do this, just be aware that the contents of the cells in the column "Corrected Format" are now string values and not datetime values. If you want to use them as datetime values then you'll have to convert them back again:

df["Corrected Format"] = df["Actual Date"].dt.strftime("%d/%m/%Y %H:%M")

Output


    Hours                   Actual Date             Corrected Format
0   44317.000000    2021-05-01 00:00:00.000000000   01/05/2021 00:00
1   44317.250394    2021-05-01 06:00:34.000041600   01/05/2021 06:00
2   44317.500139    2021-05-01 12:00:12.000009600   01/05/2021 12:00
3   44317.750463    2021-05-01 18:00:40.000003200   01/05/2021 18:00
4   44318.000058    2021-05-02 00:00:04.999968000   02/05/2021 00:00
5   44318.250266    2021-05-02 06:00:23.000025600   02/05/2021 06:00
6   44318.500544    2021-05-02 12:00:46.999958400   02/05/2021 12:00
7   44318.750313    2021-05-02 18:00:27.000000000   02/05/2021 18:00
8   44319.000521    2021-05-03 00:00:44.999971199   03/05/2021 00:00
9   44319.250729    2021-05-03 06:01:03.000028799   03/05/2021 06:01
10  44319.500428    2021-05-03 12:00:37.000022400   03/05/2021 12:00
Michael S.
  • 3,050
  • 4
  • 19
  • 34
  • Thanks. I want to convert 2011-01-01 into total hours. My model gives me time in hours which I need to convert into 2021-01-01 format because the data is actually for 2021. Surprisingly in excel if I format the hours column into date it automatically starts from 2021 but in python it starts from 1970. – ZVY545 Aug 06 '22 at 01:22
  • 1
    What is your expected output? Because in your example, when you put it into Excel, it doesn't convert it to 2011-1-1, it converts it to 2011-5-1 – Michael S. Aug 06 '22 at 01:27
  • I guess I should start at the beginning, how are you getting those numbers (44317 etc) – Michael S. Aug 06 '22 at 01:39
  • So, I have a CFD model which gives me these results in hours. The model is running from 2021. So, those hours need to be converted back to datetime format because it's hard to gues the actual datetime from hours value. During model setup, I had a card where I set up the reference time from 01/01/2011 but in excel I didn't do anything with 01/01/2021. I just took the model results and used format button to check the datetime and it gives me the expected output which is right. – ZVY545 Aug 06 '22 at 01:48
  • 1
    Got it. Check my edit, you have to essentially convert it to Excel's datetime format – Michael S. Aug 06 '22 at 01:59
  • if I use `pd.to_datetime(df.hours, unit='h')` it gives the months and seconds but starts the origin year from 1980. Is there a option to set origin in python like in R we can set `as.POSIXct(1378182603, origin="1970-01-01")` ? Thanks. – ZVY545 Aug 06 '22 at 02:40
  • 1
    Try this: `pd.to_datetime(df.Hours, unit='d', origin='1899-12-30')` – Michael S. Aug 06 '22 at 02:52
  • Thanks much Michael. You saved my life today. The `Hours` column is not hours, those values are days. The software that I am using mistakenly says the header as `Absolute Date/time (hours)` which is wrong. Now if I use `pd.to_datetime(df.Hours, unit='d',origin='1900-01-01')` , it gives correct results. So, they are days, not hours. – ZVY545 Aug 06 '22 at 03:39
  • No problem! Yeah I figured they were days. The reason I did `1899-12-30` was because of [this answer here](https://stackoverflow.com/questions/38454403/convert-excel-style-date-with-pandas) but maybe it has changed since then – Michael S. Aug 06 '22 at 03:46