0

Well, i am trying to calculate the age of patients from a dataset. I managed initially to do that with a function, but i calculated it from today to birthdate. So i tried to add an if statement for the case where the patient died. In this case i wanted to calculate the age from the death date to birthdate.

Here is my code:

def calculate_age(born, alive, death):
    today = date.today()
    today = datetime.now()
    age_in_years = today.year - born.year - ((today.month, today.day) < (born.month, born.day))
    months = (today.month - born.month - (today.day < born.day)) %12
    age = today - born
    if alive == 'No':
        age_in_years1 = death.year - born.year - ((death.month, death.day) < (born.month, born.day))
        months = (death.month - born.month - (death.day < born.day)) %12
        age = death - born
        return age_in_years1
    else:
        return age_in_years 

Then i tried to apply the function:

df['age'] = df['birthdate'].apply(calculate_age,args = (df.alive, df.death))

And i get the following error:

    ValueError                                Traceback (most recent call last)
<ipython-input-61-bde1cb6c3981> in <module>()
----> 1 df['age'] = df['birthdate'].apply(calculate_age,args = (df.alive, df.death))
                                                                                            ^
    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can anyone help?

flmlopes
  • 62
  • 1
  • 10
  • A small comment: Your function is very confusing, in my opinion. I would recommend you to initiate the if statement earlier. – Anton vBR Jun 19 '18 at 18:28

2 Answers2

1

Try:

df.apply(lambda x: calculate_age(x.birthdate, x.alive, x.death), axis=1)
koPytok
  • 3,453
  • 1
  • 14
  • 29
1

Here is an alternative way using Pandas get the age from a date (example: date of birth)

import pandas as pd
import numpy as np

# Recreate a sample dataframe
np.random.seed(2018)

df = pd.DataFrame({
    'birthday': [pd.Timestamp(1970,1,1) + pd.Timedelta(days=i) 
                 for i in np.random.randint(10000,size=10)],
    'alive': np.random.choice(['yes','no'], size=10, p = [0.8, 0.2]),
    'death': [pd.Timestamp.today().date() - pd.Timedelta(days=i) 
              for i in np.random.randint(1000,size=10)]
})

df.loc[df['alive'] == 'yes', 'death'] =  pd.Timestamp('nat')

# Calculate age
df['age'] = ((np.where(df['alive'] == 'yes', pd.Timestamp.today().date(), df['death']) 
              - df['birthday']).astype('<m8[Y]').astype(int))

# Display
print(df)

Returns:

  alive   birthday       death  age
0   yes 1995-12-02         NaT   22
1    no 1977-09-26  2016-01-29   38
2   yes 1972-07-06         NaT   45
3   yes 1990-01-20         NaT   28
4   yes 1978-01-29         NaT   40
5   yes 1988-04-17         NaT   30
6   yes 1985-11-03         NaT   32
7    no 1975-11-06  2017-01-23   41
8    no 1990-03-08  2017-06-24   27
9   yes 1980-12-07         NaT   37
Anton vBR
  • 18,287
  • 5
  • 40
  • 46