I have some code I wrote using Pandas which does the exact processing I want, but unfortunately is slow. In an effort to speed up processing times, I have gone down the path of converting the dataframe to a list of tuples, where each tuple is a row in the dataframe.
I have found that the datetime.datetime objects are converted to long ints, 1622623719000000000 for example.
I need to calculate the time difference between each row, so my thought was 'ok, I'm not great at python/pandas, but I know I can do datetime.fromtimestamp(1622623719000000000)
to get a datetime object back.
Unfortunately, datetime.fromtimestamp(1622623719000000000)
throws OSError: [Errno 22] Invalid argument
.
So, off to Google/SO to find a solution. I find this example which shows dividing the long int by 1e3
. I try that, but still get 'invalid argument.'
I play around with the division of the long int, and dividing by 1e9
gets me the closest to the original datetime.datetime value, but not quite.
How do I successfully convert the long int back to the correct datetime value?
Code to convert string format to datetime:
df.start_time = pd.to_datetime(df.report_date + " " + df.start_time)
Info on dataframe:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 19 columns):
report_date 46 non-null object
...
...
...
start_time 46 non-null datetime64[ns]
...
...
...
dtypes: datetime64[ns](1), float64(7), int64(1), object(10)
memory usage: 6.9+ KB
None
My test code:
print("DF start time", df.start_time[5], "is type", type(df.start_time[5]))
print("list start time", tup_list[5][7], "is type", type(tup_list[5][7]),"\n")
print("Convert long int in row tuple to datetime")
print(datetime.fromtimestamp(int(1622623719000000000/1e9)))
Output:
DF start time 2021-06-02 08:16:33 is type <class 'pandas._libs.tslibs.timestamps.Timestamp'>
list start time 1622623719000000000 is type <class 'int'>
Convert int in row tuple to datetime
2021-06-02 03:48:39