1

So I have a dataframe with date of birth(dob) and Inclusion Date (Ref) and since I work on an insurance company it is pretty important to know what age the person is on each month.

I tried creating a method like this

def relativeAge(dob,ref):
  ref = pd.to_datetime(ref)
  dob= pd.to_datetime(dob)        
  for ind in dob:  
      return ref.dt.year - dob.dt.year - ((ref.dt.month,ref.dt.day)< (dob.dt.month,dob.dt.day))

So I could call it like this

df['age'] = relativeAge(df['dob'], df['ref']) 

From what I got, I have to put this '.dt' in front of the atribute I am trying to get, otherwise I get this error:

AttributeError: 'Series' object has no attribute 'year'

Then, I got this:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So I tried to put . item() after each condition, using lambda... but no cigar.

I am still learning how to properly iterate through columns and rows in pandas, I believe that's why I am getting stuck on this problem. Any ideas? Do I need to iterate when calling the function?

Thanks!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Suetam016
  • 101
  • 7

2 Answers2

0

When using pandas datetime objects you can easily calculate a timedelta using normal operands.

# Creating first dummy datetime series
dt1 = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30',
                '2010-4-22 09:25', '2019-11-8 02:22'])

# Convert dt1 datetime 
dt1 = pd.to_datetime(dt1)

# Creating seconddummy datetime series
dt2 = pd.Series(['2000-10-21 09:30', '1989-7-18 12:30', '1994-06-12 10:30',
                '2004-5-22 09:25', '2009-11-2 02:22'])

# Convert dt2 to datetime 
dt2 = pd.to_datetime(dt2)

If you are just interested in finding the time between any two dates you can then just do:

dt_diff = dt1-dt2

Output:

0    4383 days
1   10957 days
2    4983 days
3    2161 days
4    3658 days
dtype: timedelta64[ns]

That is obviously in days, but would represent the relative time difference between any two series.

If you wanted the output to be in any specific date string format I would reference this question where for me the simplest solution to get years was the following:

dt_diff / np.timedelta64(1, 'Y')

Output:

0    12.000246
1    29.999247
2    13.642991
3     5.916617
4    10.015264
dtype: float64

I would reference the anwsers in that question for some other solutions to formatting the time in specific ways.

bkeesey
  • 466
  • 4
  • 12
0

You can try using relativedelta package.

from dateutil.relativedelta import relativedelta
df['age'] = df.apply(lambda x: relativedelta(pd.to_datetime(x.ref),pd.to_datetime(x.dob)).years,axis =1)