5

I have the following code from somebody else that has a similar problem, but the solution proposed does not work on my DataFrame. The code subtracts a Pandas DataFrame index from a given date:

my_date = pd.datetime.today()
MyDF['day_differential'] = (MyDF.index - my_date).days

Which is generating the following error in my DataFrame:

TypeError: Timestamp subtraction must have the same timezones or no timezones

How do I found out tz for both dates? How do I make them the same so I can calculate the number of days between them?

Luis Miguel
  • 5,057
  • 8
  • 42
  • 75
  • Actually they both must have same format , can you print `MyDF.index` ? – ZdaR Jan 31 '15 at 17:28
  • [2014-12-26 02:34:36+00:00, ..., 2007-01-30 18:57:58+00:00] Length: 50, Freq: None, Timezone: UTC – Luis Miguel Jan 31 '15 at 17:33
  • 1
    As far I can see this is a list of timestamps , and you are doing something like `[1, 2, 3, 4, 5] - 9` I mean to say that you are subtracting an object from a list of objects, can you define clearly what do you want to achieve ? do you want to subtract my_date from all the dates in the MyDF or you are trying something else? – ZdaR Jan 31 '15 at 17:37
  • I want time elapsed (days, minutes, seconds) between all the dates in an array and today. – Luis Miguel Jan 31 '15 at 17:39
  • ok, working on it have patience – ZdaR Jan 31 '15 at 17:39
  • can you show how does `print pd.datetime.today()` look like ? – ZdaR Jan 31 '15 at 17:51
  • 2
    try `MyDF.index.tz_convert(None) - datetime.utcnow()` – jfs Feb 01 '15 at 02:56
  • 1
    @J.F.Sebastian you should post that as the answer so it can be accpeted and this question doesn't stay unanswered – EdChum Feb 01 '15 at 09:57
  • 1
    @EdChum you may go ahead and expand the comment into an answer – jfs Feb 01 '15 at 10:38
  • Thanks @J.F.Sebastian and EdChum ! The answer of Sebastian worked well, and the explanation from Ed expanded was great. However, I found a very strange behavior running the code (same python version, numpy, and pandas, etc.) on different machines (Ubuntu vs Mac). There must be some bug, because some solutions were OS dependent. So, I have two solutions now, both wok well, but Sebastian's answer is the more pythonic. Cheers – Luis Miguel Feb 01 '15 at 18:47

1 Answers1

6

Here is an answer using J.F. Sebastian's comment thanks really to him, because your index has timezone information then the operations must also be timezone aware, in your case the timezone is utc so you need to generate a utc timestamp to perform the subtraction:

In [11]:

import pandas as pd
import numpy as np
import datetime as dt
my_date = pd.datetime.today()
MyDF = pd.DataFrame({'a':np.random.randn(5)})
MyDF.index = pd.date_range('1/1/2011', periods=5, freq='H', tz='utc')
MyDF['day_differential'] = MyDF.index.tz_convert(None) - dt.datetime.utcnow()
MyDF
Out[11]:
                                  a            day_differential
2011-01-01 00:00:00+00:00  1.399602 -1493 days +13:04:06.875715
2011-01-01 01:00:00+00:00 -1.962517 -1493 days +14:04:06.875715
2011-01-01 02:00:00+00:00 -1.574531 -1493 days +15:04:06.875715
2011-01-01 03:00:00+00:00 -0.224702 -1493 days +16:04:06.875715
2011-01-01 04:00:00+00:00 -0.800772 -1493 days +17:04:06.875715

You can find out if your index is timezone aware by ouputting the index:

In [12]:

MyDF.index
Out[12]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00+00:00, ..., 2011-01-01 04:00:00+00:00]
Length: 5, Freq: H, Timezone: UTC

compare with a non timezone aware index:

In [14]:

MyDF.index
Out[14]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-01-01 04:00:00]
Length: 5, Freq: H, Timezone: None
EdChum
  • 376,765
  • 198
  • 813
  • 562