158

I want to subtract dates in 'A' from dates in 'B' and add a new column with the difference.

df
          A        B
one 2014-01-01  2014-02-28 
two 2014-02-03  2014-03-01

I've tried the following, but get an error when I try to include this in a for loop...

import datetime
date1=df['A'][0]
date2=df['B'][0]
mdate1 = datetime.datetime.strptime(date1, "%Y-%m-%d").date()
rdate1 = datetime.datetime.strptime(date2, "%Y-%m-%d").date()
delta =  (mdate1 - rdate1).days
print delta

What should I do?

JJJ
  • 1,009
  • 6
  • 19
  • 31
Jase Villam
  • 2,895
  • 6
  • 18
  • 21

5 Answers5

194

To remove the 'days' text element, you can also make use of the dt() accessor for series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html

So,

df[['A','B']] = df[['A','B']].apply(pd.to_datetime) #if conversion required
df['C'] = (df['B'] - df['A']).dt.days

which returns:

             A          B   C
one 2014-01-01 2014-02-28  58
two 2014-02-03 2014-03-01  26
Ricky McMaster
  • 4,289
  • 2
  • 24
  • 23
  • 5
    Great answer. In my case, `df['C'] = (df['B'] - df['A']).dt.days` did not work and I had to use `df['C'] = (df['B'] - df['A']).days`. Any idea why mine did not give the number of days as expected? – Samuel Nde Sep 21 '18 at 18:44
  • Nde - how exactly did it not work? Error or wrong values? Did you convert both A and B columns to datetime successfully? – Ricky McMaster Sep 24 '18 at 06:56
  • 3
    Both my columns are datetime (or `datetime64[ns]` to be precise). When I did `df['C'] = (df['B'] - df['A']).dt.days`, I got an attribute error that said **AttributeError: 'Timedelta' object has no attribute 'dt'**, so I tried **df['C'] = (df['B'] - df['A']).days** which gave me the desired answer. (Of course I am using my own dataframe not the one in the example above. Or could it be because I also have time in my date and not as in `2018-09-24 10:17:18.800277`) – Samuel Nde Sep 24 '18 at 17:20
  • How does this function rounds, if my original dataframe is with Hours:Minutes:Seconds? – PV8 Jun 27 '19 at 08:35
  • 1
    one small suggestion, use df['A'] = pd.to_datetime(df['A']) is much faster than using apply function, – Jacob2309 Mar 11 '23 at 02:25
123

Assuming these were datetime columns (if they're not apply to_datetime) you can just subtract them:

df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])

In [11]: df.dtypes  # if already datetime64 you don't need to use to_datetime
Out[11]:
A    datetime64[ns]
B    datetime64[ns]
dtype: object

In [12]: df['A'] - df['B']
Out[12]:
one   -58 days
two   -26 days
dtype: timedelta64[ns]

In [13]: df['C'] = df['A'] - df['B']

In [14]: df
Out[14]:
             A          B        C
one 2014-01-01 2014-02-28 -58 days
two 2014-02-03 2014-03-01 -26 days

Note: ensure you're using a new of pandas (e.g. 0.13.1), this may not work in older versions.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 32
    Do can we get rid of the "days" portion in the result incase we just need to see the numeric value ie. -58, -26 in this case. – 0nir Oct 22 '14 at 17:24
  • 7
    to expand on @AndyHayden comment, that works but it should `pd.offsets.Day(1)` (with an 's'). I also usually negate it, so you get `(df['A'] - df['B']) / pd.offsets.Day(-1)` – dirkjot Oct 14 '15 at 18:54
  • 12
    However, if you want to do this on a whole Series you need `(df['A'] - df['B']) / np.timedelta64(-1, 'D')` for reasons that I don't fully understand. – dirkjot Oct 14 '15 at 19:05
  • @dirkjot Thanks for spotting the typo! IIRC this was fix in recent pandas, are you using 0.16.2 / 0.17? – Andy Hayden Oct 14 '15 at 19:27
  • I found that this was a bit buggy when there was missing data. The problems are that 1) the missing data doesn't have an `.isnull()` attribute and 2) it has a `.day` attribute but the non-missing data has a `.days` attribute. So after creating the new variable I ran a loop over each `obs`ervation that checks: `if hasattr(obs,'days')` then assign `obs.days` and else assign `np.nan`. – webelo Apr 26 '17 at 21:44
  • 5
    @webelo the DatetimeIndex/Series itself should have a `.dt.days` attribute which should be strongly preferred. – Andy Hayden Apr 26 '17 at 23:33
  • In case we have a constant date on one hand and a series on another hand we can do `d0 = pd.to_datetime(date(2017, 2, 28))` `d1 = df.Date` `delta = d1 - d0` – int soumen Sep 22 '21 at 17:17
14

A list comprehension is your best bet for the most Pythonic (and fastest) way to do this:

[int(i.days) for i in (df.B - df.A)]
  1. i will return the timedelta(e.g. '-58 days')
  2. i.days will return this value as a long integer value(e.g. -58L)
  3. int(i.days) will give you the -58 you seek.

If your columns aren't in datetime format. The shorter syntax would be: df.A = pd.to_datetime(df.A)

A.Kot
  • 7,615
  • 2
  • 22
  • 24
1

How about this:

times['days_since'] = max(list(df.index.values))  
times['days_since'] = times['days_since'] - times['months']  
times
tharindu_DG
  • 8,900
  • 6
  • 52
  • 64
Tom
  • 41
  • 5
0

Solution above did not work for me. If you are using a simple pd.to_datetime() to first convert it, later you can just use:

import numpy as np

df['C'] = df['A'] - df['B'] / np.timedelta64(1, 'D')
M--
  • 25,431
  • 8
  • 61
  • 93