Python pandas add a years integer column to a date column

Question

I have a question somehow similar to what discussed here How to add a year to a column of dates in pandas however in my case, the number of years to add to the date column is stored in another column. This is my not working code:

import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]],  columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_Date'] = df1['Date'] + pd.offsets.DateOffset(years=df1['Years'])

The goal is to add 5 years to the current date for row 1, 3 years to current date in row 2, eccetera. Any suggestions? Thank you

The issue, basically, is that pandas' time delta doesn't take a vectorised input. It throws `TypeError: cannot convert the series to `. — ifly6, Jul 13 '21 at 19:32
Right, that's exactly my problem. Would you know how to handle that? Thank you — Angelo, Jul 13 '21 at 19:35

ifly6 · Accepted Answer · 2022-12-13T21:34:04.690

3

Convert to time delta by converting years to days, then adding to a converted datetime column:

df1['Final_Date'] = pd.to_datetime(df1['Date']) \
    + pd.to_timedelta(df1['Years'] * 365, unit='D')

Use of to_timedelta with unit='Y' for years is deprecated and throws ValueError.

Edit. If you need day-exact changes, you will need to go row-by-row and update the date objects accordingly. Other answers explain.

edited Dec 13 '22 at 21:34

answered Jul 13 '21 at 19:39

ifly6

5,003
2
24
47

1

The problem with this method are the leap years. That said if OP is not at a couple of days around the exact month-day, then that is the simplest way :) – Ben.T Jul 13 '21 at 19:44

score 1 · Answer 2 · answered Jul 13 '21 at 19:37

Assuming the number of different values in Years is limited, you can try groupby and do the operation with pd.DateOffset like:

df1['new_date'] = (
    df1.groupby('Years')
       ['Date'].apply(lambda x: x + pd.DateOffset(years=x.name))
)
print(df1)
    Name  Years        Date   new_date
0    Tom      5  2021-07-13 2026-07-13
1   Jane      3  2021-07-13 2024-07-13
2  Peter      1  2021-07-13 2022-07-13

else you can extract year, month and day, add the Years column to year and recreate a datetime column

df1['Date'] = pd.to_datetime(df1['Date'])
df1['new_date'] = (
    df1.assign(year=lambda x: x['Date'].dt.year+x['Years'], 
               month=lambda x: x['Date'].dt.month,
               day=lambda x: x['Date'].dt.day, 
               new_date=lambda x: pd.to_datetime(x[['year','month','day']]))
       ['new_date']
)

same result

tbessho · Answer 3 · 2021-07-13T19:46:51.327

0

import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]],  columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_date'] = datetime.date.today()

df1['Final_date'] = df1.apply(lambda g: g['Date'] + pd.offsets.DateOffset(years = g['Years']), axis=1)


print(df1)

Try this, you were trying to add the whole column when you called pd.offsets.DateOffset(years=df1['Years']) instead of just 1 value in the column.

EDIT: I changed from iterrows to a vectorization method due to iterrows's poor performance

edited Jul 13 '21 at 19:46

answered Jul 13 '21 at 19:32

tbessho

46
3

Don't use `iterrows` unless absolutely necessary; it's extremely poorly performant – ifly6 Jul 13 '21 at 19:32
I agree, as much as I appreciate the solution, I would need a vectorized solution instead – Angelo Jul 13 '21 at 19:35
@ifly6 you are absolutely right, I forgot to consider the performance issues. – tbessho Jul 13 '21 at 19:45
1

Hi Tomo and welcome to contribution. FYI, while `apply` for row-wise operation (axis=1) is a bit better than `iterrows`, it is still not a vectorized solution, [see this answer](https://stackoverflow.com/a/55557758/9274732) for example or [this question](https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code) with timing examples :) – Ben.T Jul 13 '21 at 19:59

Python pandas add a years integer column to a date column

3 Answers3