4
from datetime import datetime
import pandas as pd

date="2020-02-07T16:05:16.000000000"

#Convert using datetime
t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f')

#Convert using Pandas
t2=pd.to_datetime(date)

#Subtract the dates
print(t1-t2)

#subtract the date timestamps
print(t1.timestamp()-t2.timestamp())

In this example, my understanding is that both datetime and pandas should use timezone naive dates. Can anyone explain why the difference between the dates is zero, but the difference between the timestamps is not zero? It's off by 5 hours for me, which is my time zone offset from GMT.

Dan
  • 43
  • 2
  • 1
    "Warning Because naive datetime objects are treated by many datetime methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing a specific timestamp in UTC is by calling datetime.fromtimestamp(timestamp, tz=timezone.utc)." From Python dateteime – Scott Boston Jun 29 '20 at 19:33
  • 1
    https://docs.python.org/3/library/datetime.html#timezone-objects – Scott Boston Jun 29 '20 at 19:33
  • 1
    Native datetime is return a time with an awareness of your local timezone. – Scott Boston Jun 29 '20 at 19:34

1 Answers1

1

Naive datetime objects of Python's datetime.datetime class represent local time. This is kind of obvious from the docs but can be a brain-teaser to work with nevertheless. If you call the timestamp method on it, the returned POSIX timestamp refers to UTC (seconds since the epoch) as it should.

Coming from the Python datetime object, the behavior of a naive pandas.Timestamp can be counter-intuitive (and I think it's not so obvious). Derived the same way from a tz-naive string, it doesn't represent local time but UTC. You can verify that by localizing the datetime object to UTC:

from datetime import datetime, timezone
import pandas as pd

date = "2020-02-07T16:05:16.000000000"

t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
t2 = pd.to_datetime(date)

print(t1.replace(tzinfo=timezone.utc).timestamp() - t2.timestamp())
# 0.0

The other way around you can make the pandas.Timestamp timezone-aware, e.g.

t3 = pd.to_datetime(t1.astimezone())
# e.g. Timestamp('2020-02-07 16:05:16+0100', tz='Mitteleuropäische Zeit')

# now both t1 and t3 represent my local time:
print(t1.timestamp() - t3.timestamp())
# 0.0

My bottom line is that if you know that the timestamps you have represent a certain timezone, work with timezone-aware datetime, e.g. for UTC

import pytz # need to use pytz here since pandas uses that internally

t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=pytz.UTC)
t2 = pd.to_datetime(date, utc=True)

print(t1 == t2)
# True
print(t1-t2)
# 0 days 00:00:00
print(t1.timestamp()-t2.timestamp())
# 0.0
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thanks for the helpful explanation, it makes sense now. I wasn't realizing that a datetime object with tz=None would be aware of your local timezone. – Dan Jun 30 '20 at 15:17
  • @Dan: well, the naive datetime object *represents* the local time but is sort-of *unaware* of it - anyway, glad I could help ;-) – FObersteiner Jun 30 '20 at 15:24