1

I have two columns in my dataframe that I have converted into datetime. I'm trying to subtract these numbers and find the difference in years. This is the code I'm using:

from dateutil.relativedelta import relativedelta
difference_in_years = relativedelta(x['start'], x['end']).year

However, I'm getting the following error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is the issue?

petezurich
  • 9,280
  • 9
  • 43
  • 57
HHH
  • 6,085
  • 20
  • 92
  • 164

4 Answers4

3

Use attribute .years with apply and axis=1 for process by rows:

df = pd.DataFrame({'start':['2015-10-02','2014-11-05'],
                   'end':['2018-01-02','2018-10-05']})

df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

from dateutil.relativedelta import relativedelta

df['y'] = df.apply(lambda x: relativedelta(x['end'], x['start']).years, axis=1)

Or use list comprehension:

df['y'] = [relativedelta(i, j).years for i, j in zip(df['end'], df['start'])]

print (df)
       start        end  y
0 2015-10-02 2018-01-02  2
1 2014-11-05 2018-10-05  3

EDIT:

df = pd.DataFrame({'start':['2015-10-02','2014-11-05'],
                   'end':['2018-01-02',np.nan]})

df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

from dateutil.relativedelta import relativedelta

m = df[['start','end']].notnull().all(axis=1)
df.loc[m, 'y'] = df[m].apply(lambda x: relativedelta(x['end'], x['start']).years, axis=1)
print (df)
       start        end    y
0 2015-10-02 2018-01-02  2.0
1 2014-11-05        NaT  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I'm getting the following error AssertionError: occurred at index 0. Is it because I have some NaT values? – HHH Dec 08 '18 at 16:40
  • @H.Z. - Yes, it is possible. Check edit - filter only non NaNs rows in both datetimes columns. – jezrael Dec 08 '18 at 16:43
1

You can do it by

(df['end'] - df['start'])/pd.Timedelta(1, 'Y')

and round the result if needed.

In pandas v0.23.4 and later you can do

(df['end'] - df['start'])//pd.Timedelta(1, 'Y')

to get whole year difference straight away.

Update: In panda v0.25 and later, pd.Timedelta(1, 'Y') is not supported, because a year is not a consistent unit of measure (sometimes 365d, sometimes 366d). You can do this instead if the 365d approximation is acceptable:

(df['end'] - df['start'])/pd.Timedelta(365, 'D')
Mattravel
  • 1,358
  • 1
  • 15
ayorgo
  • 2,803
  • 2
  • 25
  • 35
1

You can divide a timedelta series by year units and, if necessary, round:

# data from jezrael

df['years'] = (df['end'] - df['start']) / np.timedelta64(1, 'Y')
df['years_floor'] = df['years'].round()

print(df)

       start        end     years  years_floor
0 2015-10-02 2018-01-02  2.253297          2.0
1 2014-11-05        NaT       NaN          NaN
jpp
  • 159,742
  • 34
  • 281
  • 339
  • pandas also has timedelta: `df['years'] = (df['end'] - df['start']) / pd.Timedelta(1, 'Y')` https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Timedelta.html#pandas.Timedelta – Koo Dec 08 '18 at 20:48
0

Check this answer calculate the difference between two datetime.date() dates in years and months

from dateutil import relativedelta as rdelta
from datetime import date
d1 = date(2001,5,1)
d2 = date(2012,1,1)
rd = rdelta.relativedelta(d2,d1)
rd
relativedelta(years=+10, months=+8)
Jorge
  • 2,181
  • 1
  • 19
  • 30