3

I have a column 'datedif' in my dataframe as :

  exposuredate min_exposure_date    datedif
  2014-10-08   2014-09-27           11 days
  2014-10-09   2014-09-27           12 days
  2014-09-27   2014-09-27           0 days
  2014-09-27   2014-09-27           0 days
  2014-10-22   2014-09-27           25 days

  data.exposuredate = pd.to_datetime(data.exposuredate)
  data.min_exposure_date = pd.to_datetime(data.min_exposure_date)

  data['datedif'] = ((data.exposuredate)-(data.min_exposure_date))

The format for the columns are datetime64[ns]. I want to extract the number of days in the field 'datedif'. I am not able to find anything which could help me to extract the difference in terms of number of days.

I tried:

data['datedif_day'] = data['datedif'].dt.days

Error:

AttributeError: 'Series' object has no attribute 'dt'

shaik moeed
  • 5,300
  • 1
  • 18
  • 54
Pulkit Jha
  • 1,709
  • 3
  • 12
  • 18
  • That's nice. Good luck with that. Did you have a question? – Marc B Jan 27 '15 at 14:11
  • quoting mysel again : " I want to extract number of days in the field 'datedif' ". That means how many days are there in that datefield 'datedif'? – Pulkit Jha Jan 27 '15 at 14:12
  • So what have you tried, and what precisely is the problem with your current code? This **is not** a code-writing service. – jonrsharpe Jan 27 '15 at 14:14
  • @jonrsharpe Have updated the question. – Pulkit Jha Jan 27 '15 at 14:22
  • *"... and what precisely is the problem with your current code?"* – jonrsharpe Jan 27 '15 at 14:23
  • current code is working fine. I want to know what should be the code if i want to extract number of days from the datedifference which i have calculated. – Pulkit Jha Jan 27 '15 at 14:24
  • 1
    Again, this isn't a code-writing service. What research have you done? What have you tried? – jonrsharpe Jan 27 '15 at 14:26
  • http://stackoverflow.com/questions/151199/how-do-i-calculate-number-of-days-betwen-two-dates-using-python I have been going through link like this and many other posts but all of them have taken two dates separately and illustrated how to do that. None of them has shown this with date format 'yyyy-mm-dd'. – Pulkit Jha Jan 27 '15 at 14:28
  • 1
    Remove or comment out ```data.datedif = pd.to_datetime(data.datedif)``` - then ```datedif``` will be a [```Timedelta``` object](http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#operations). – wwii Jan 27 '15 at 14:50
  • @wwii : Thanks, I have tried but getting an error. I have updated the error in question. – Pulkit Jha Jan 27 '15 at 15:03
  • If your question is about `pandas` API; you should mention it in the question and/or add it to the tags so that people who follow [tag:pandas] could see your question. – jfs Jan 27 '15 at 23:36

3 Answers3

9

The Pandas Docs relate to the types of conversions you are looking for as Frequency Conversion

The two options are 1) division by Timedelta or 2) type conversion. There is a subtle difference between the two as stated in the docs:

"Note that division by the numpy scalar is true division, while astyping is equivalent of floor division."

data = pd.DataFrame([("2014-10-08", "2014-09-27"),
                     ("2014-10-09", "2014-09-27"),
                     ("2014-09-27", "2014-09-27"),
                     ("2014-10-22", "2014-09-27")],
                    columns=["exposuredate", "min_exposure_date"])

data['datediff'] =   pd.to_datetime(data.exposuredate) 
                   - pd.to_datetime(data.min_exposure_date)

data['datediff'] / pd.Timedelta(1, unit='d')
data['datediff'].astype('timedelta64[D]')

Both operations yield:

0    11.0
1    12.0
2     0.0
3    25.0
Name: datediff, dtype: float64

If you are using the date difference as a feature for training a machine learning algorithm, it doesn't matter in which form they are represented as they should be normalised anyway. timedelta64[ns] is a perfectly fine for that. When it comes to visualisation purposes, see this post.

Hendrik F
  • 3,690
  • 3
  • 21
  • 24
3

The 'datedif' looks in days format but actually it is in seconds. So in order to get number of days for furhter use add the following line in the code:

   data['datedif'] = data['datedif'].astype(np.numpy64)
   data['datedif_day'] = (data['datedif']/86400000000000)
shaik moeed
  • 5,300
  • 1
  • 18
  • 54
Pulkit Jha
  • 1,709
  • 3
  • 12
  • 18
1

Came across this same question today and I think the following solutin is the easiest:

Setup:

df = pd.DataFrame([("2014-10-08", "2014-09-27"),
                     ("2014-10-09", "2014-09-27"),
                     ("2014-09-27", "2014-09-27"),
                     ("2014-10-22", "2014-09-27")],
                    columns=["exposuredate", "min_exposure_date"])

df['datediff'] =   pd.to_datetime(df.exposuredate) - pd.to_datetime(df.min_exposure_date)

    exposuredate    min_exposure_date   datediff
0   2014-10-08      2014-09-27          11 days
1   2014-10-09      2014-09-27          12 days
2   2014-09-27      2014-09-27          0 days
3   2014-10-22      2014-09-27          25 days

Solution:

df.datediff.apply(lambda x: x.days)

0    11
1    12
2     0
3    25
Name: datediff, dtype: int64
Allen Qin
  • 19,507
  • 8
  • 51
  • 67