6

I think this should be easy but I'm hitting a bit of a wall. I have a dataset that was imported into a pandas dataframe from a Stata .dta file. Several of the columns contain date data. The dataframe contains 100,000+ rows but a sample is given:

   cat  event_date  total
0   G2  2006-03-08     16
1   G2         NaT    NaN
2   G2         NaT    NaN
3   G3  2006-03-10     16
4   G3  2006-08-04     12
5   G3  2006-12-28     13
6   G3  2007-05-25     10
7   G4  2006-03-10     13
8   G4  2006-08-06     19
9   G4  2006-12-30     16

The data is stored as a datetime64 format:

>>> mydata[['cat','event_date','total']].dtypes
cat                    object
event_date     datetime64[ns]
total                 float64
dtype: object

All I would like to do is create a new column which gives the difference in days (rather than 'us' or 'ns'!!!) between the event_date and a start date, say 2006-01-01. I've tried the following:

>>> mydata['new'] = mydata['event_date'] - np.datetime64('2006-01-01')

… but I get the message:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I've also tried a lambda function but that doesn't work either.

However, if I wanted to simply add on one day to each date I can successfully use:

>>> mydata['plusone'] = mydata['event_date'] + np.timedelta64(1,'D')

That works fine.

Am I missing something straightforward here?

Thanks in advance for any help.

user1718097
  • 4,090
  • 11
  • 48
  • 63
  • Your problem is strange, I can't explain but the following worked for me: `import datetime as dt mydata['new'] = mydata['event_date'] - dt.datetime(2006,1,1)` could you confirm – EdChum Aug 12 '14 at 07:15
  • 1
    I work pretty much avoid using ``np.datetime64`` directly in any event, that said, this is a bug: https://github.com/pydata/pandas/issues/7996 – Jeff Aug 12 '14 at 11:41

2 Answers2

6

Not sure why the numpy datetime64 is incompatible with pandas dtypes but using datetime objects worked fine for me:

In [39]:

import datetime as dt
mydata['new'] = mydata['event_date'] - dt.datetime(2006,1,1)
mydata
Out[39]:
      cat event_date  total      new
Index                               
0      G2 2006-03-08     16  66 days
1      G2        NaT    NaN      NaT
2      G2        NaT    NaN      NaT
3      G3 2006-03-10     16  68 days
4      G3 2006-08-04     12 215 days
5      G3 2006-12-28     13 361 days
6      G3 2007-05-25     10 509 days
7      G4 2006-03-10     13  68 days
8      G4 2006-08-06     19 217 days
9      G4 2006-12-30     16 363 days
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Works perfectly - thanks very much! But I'm still slightly confused - when date and time information is stored in pandas dataframe as datetime64, isn't that a numpy format? And, if so, why does datetime work but datetime64 doesn't? – user1718097 Aug 12 '14 at 12:18
  • @user1718097 that is what confuses me, I don't have an answer for that, hopefully one of the Pandas dev's may comment – EdChum Aug 12 '14 at 12:18
  • Just seen Jeff's comment, this is a bug, should get fixed in a future release, you can upvote too ;) – EdChum Aug 12 '14 at 12:19
  • @user1718097 best to post bugs direct to https://github.com/pydata/pandas/issues rather than here on SO, that's if you know it is a bug which can be difficult to tell sometimes – EdChum Aug 12 '14 at 13:11
2

Ensure you have an upto date version of pandas and numpy (>=1.7):

In [11]: df.event_date - pd.Timestamp('2006-01-01')
Out[11]:
0    66 days
1        NaT
2        NaT
3    68 days
4   215 days
5   361 days
6   509 days
7    68 days
8   217 days
9   363 days
Name: event_date, dtype: timedelta64[ns]
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Thanks for the info - works perfectly. A very similar solution was given at almost exactly the same time using dt.datetime() rather than pd.Timestamp(). I've marked that answer as accepted simply because I've used datetime() more frequently in the past. But this solution works equally well. – user1718097 Aug 12 '14 at 12:20