1

Question

See code below demonstrating the issue. A simple pandas dataframe is created with one row and one column containing one datetime instance. As you can see, calling timestamp() on the datetime object returns 1581894000.0. Selecting the datetime object through the dataframe and calling timestamp() gives 1581897600.0. When using pandas apply function to call datetime.timestamp on each row of column 'date', the return value becomes 1581894000.0. I would expect to get the same timestamp value in all situations.

In[19]: d = datetime(2020, 2, 17)
In[20]: d.timestamp()
Out[20]: 1581894000.0 <----------------------------------+
In[21]: df = pd.DataFrame({'date': [d]})                 |
In[22]: df                                               |
Out[22]:                                                 |
        date                                             |
0 2020-02-17                                             |
In[23]: df['date'][0]                                    |
Out[23]: Timestamp('2020-02-17 00:00:00')                |
In[24]: df['date'][0].timestamp()                        |
Out[24]: 1581897600.0 <---------------------- These should be the same
In[25]: df['date'].apply(datetime.timestamp)             |
Out[25]:                                                 | 
0    1.581894e+09                                        |
Name: date, dtype: float64                               |
In[26]: df['date'].apply(datetime.timestamp)[0]          |
Out[26]: 1581894000.0 <----------------------------------+

Edit

Thanks to input from @ALollz, using to_datetime and Timestamp from pandas, as shown below seems to fix the problem.

In[15]: d = pd.to_datetime(datetime(2020,2,17))
In[16]: d.timestamp()
Out[16]: 1581897600.0
In[17]: df = pd.DataFrame({'date': [d]}) 
In[18]: df
Out[18]: 
        date
0 2020-02-17
In[19]: df['date'][0]
Out[19]: Timestamp('2020-02-17 00:00:00')
In[20]: df['date'][0].timestamp()
Out[20]: 1581897600.0
In[21]: df['date'].apply(pd.Timestamp.timestamp)
Out[21]: 
0    1.581898e+09
Name: date, dtype: float64
In[22]: df['date'].apply(pd.Timestamp.timestamp)[0]
Out[22]: 1581897600.0
rindis
  • 869
  • 11
  • 25

1 Answers1

2

The problem is timezone awareness. pandas doesn't always play well with the datetime module and some decisions diverge from the standard library, in this case how to deal with timezone unaware datetime objects.

This specific issue seems to have been a design choice based upon this open issue

Yah, for tz-naive we implement timestamp as if it were UTC. Among other things, this ensures that we get the same behavior regardless of where the code is running.

So to get a consistent answer you'd need a UTC localized timezone so that datetime.timestamp used that instead of your machine's local timezone.


from datetime import datetime
import pytz

my_date = datetime(2020, 2, 17)
my_date_aware = pytz.utc.localize(my_date)

# UTC aware is the same as pandas
datetime.timestamp(my_date_aware) - pd.to_datetime(my_date).timestamp()
#0

datetime.timestamp(my_date) - pd.to_datetime(my_date).timestamp()
#18000.0
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Thanks! Think I understand the problem now and trying to stick with the functionality in pandas. See the edit of my question for how I fixed it using your suggestions. – rindis Feb 17 '20 at 05:30
  • 1
    @MartinRindarøy you can read about it a little bit https://stackoverflow.com/questions/34622302/python-datetime-timestamp-issue, but I think the issue is mostly that datetime.timestamp uses whatever your system determines to be the "local time", so unless that's UTC they wont match. – ALollz Feb 17 '20 at 05:37
  • @ALollz: is it the reason `datetime.timestamp(p)` returns different values from `pd.Timestamp.timestamp(p)` on `p = pd.Timestamp(2020, 2, 17)` although `pd.Timestamp` is subclass of `datetime.datetime` ? – Andy L. Feb 17 '20 at 05:42
  • 1
    @ALollz: Thanks. I learn quite few new things from your answer and links :) +1 – Andy L. Feb 17 '20 at 05:53