I have looked at several other related questions here, here, and here, and none of them have come across quite the same problem as me.
I am using Pandas version 0.16.2. I have several columns in a Pandas dataframe, of dtype datetime64[ns]:
In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"]
In [11]: data[date_list].head()
Out[11]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate DOB InForceDate
0 2015-04-30 1976-03-04 2002-11-18
1 NaT 1949-09-27 2015-01-13
2 NaT 1947-06-15 2014-10-15
3 2015-07-30 1960-06-07 2009-08-27
4 2010-04-21 1950-10-01 2007-04-19
These were originally in string format (e.g. '1976-03-04') which I converted to datetime objects using:
In [7]: for datecol in date_list:
...: data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')
Here are the dtypes for each of these columns:
In [8]: for datecol in date_list:
print data[datecol].dtypes
returns:
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
So far, so good. But what I want to do is create a new column for each of these columns that gives the age in days (as an integer) from a certain date.
In [13]: current_date = pd.to_datetime("2015-07-31")
I first ran this:
In [14]: for i in date_list:
....: data[i+"InDays"] = data[i].apply(lambda x: current_date - x)
However, when I check the dtype of the returned columns:
In [15]: for datecol in date_list:
....: print data[datecol + "InDays"].dtypes
I get these:
object
timedelta64[ns]
object
timedelta64[ns]
object
timedelta64[ns]
timedelta64[ns]
timedelta64[ns]
I don't know why three of them are objects, when they should be timedeltas. What I want to do next is:
In [16]: for i in date_list:
....: data[i+"InDays"] = data[i+"InDays"].dt.days
This approach works fine for the timedelta columns. However, since three of the columns are not timedeltas, I get this error:
AttributeError: Can only use .dt accessor with datetimelike values
I suspect that there are some values in those three columns that are preventing Pandas from converting them to timedeltas. I can't figure out how to work out what those values might be.