1

I have a question somehow similar to what discussed here How to add a year to a column of dates in pandas however in my case, the number of years to add to the date column is stored in another column. This is my not working code:

import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]],  columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_Date'] = df1['Date'] + pd.offsets.DateOffset(years=df1['Years'])

The goal is to add 5 years to the current date for row 1, 3 years to current date in row 2, eccetera. Any suggestions? Thank you

Angelo
  • 1,594
  • 5
  • 17
  • 50

3 Answers3

3

Convert to time delta by converting years to days, then adding to a converted datetime column:

df1['Final_Date'] = pd.to_datetime(df1['Date']) \
    + pd.to_timedelta(df1['Years'] * 365, unit='D')

Use of to_timedelta with unit='Y' for years is deprecated and throws ValueError.

Edit. If you need day-exact changes, you will need to go row-by-row and update the date objects accordingly. Other answers explain.

ifly6
  • 5,003
  • 2
  • 24
  • 47
  • 1
    The problem with this method are the leap years. That said if OP is not at a couple of days around the exact month-day, then that is the simplest way :) – Ben.T Jul 13 '21 at 19:44
1

Assuming the number of different values in Years is limited, you can try groupby and do the operation with pd.DateOffset like:

df1['new_date'] = (
    df1.groupby('Years')
       ['Date'].apply(lambda x: x + pd.DateOffset(years=x.name))
)
print(df1)
    Name  Years        Date   new_date
0    Tom      5  2021-07-13 2026-07-13
1   Jane      3  2021-07-13 2024-07-13
2  Peter      1  2021-07-13 2022-07-13

else you can extract year, month and day, add the Years column to year and recreate a datetime column

df1['Date'] = pd.to_datetime(df1['Date'])
df1['new_date'] = (
    df1.assign(year=lambda x: x['Date'].dt.year+x['Years'], 
               month=lambda x: x['Date'].dt.month,
               day=lambda x: x['Date'].dt.day, 
               new_date=lambda x: pd.to_datetime(x[['year','month','day']]))
       ['new_date']
)

same result

Ben.T
  • 29,160
  • 6
  • 32
  • 54
0
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]],  columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_date'] = datetime.date.today()

df1['Final_date'] = df1.apply(lambda g: g['Date'] + pd.offsets.DateOffset(years = g['Years']), axis=1)


print(df1)

Try this, you were trying to add the whole column when you called pd.offsets.DateOffset(years=df1['Years']) instead of just 1 value in the column.

EDIT: I changed from iterrows to a vectorization method due to iterrows's poor performance

tbessho
  • 46
  • 3
  • Don't use `iterrows` unless absolutely necessary; it's extremely poorly performant – ifly6 Jul 13 '21 at 19:32
  • I agree, as much as I appreciate the solution, I would need a vectorized solution instead – Angelo Jul 13 '21 at 19:35
  • @ifly6 you are absolutely right, I forgot to consider the performance issues. – tbessho Jul 13 '21 at 19:45
  • 1
    Hi Tomo and welcome to contribution. FYI, while `apply` for row-wise operation (axis=1) is a bit better than `iterrows`, it is still not a vectorized solution, [see this answer](https://stackoverflow.com/a/55557758/9274732) for example or [this question](https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code) with timing examples :) – Ben.T Jul 13 '21 at 19:59