1

I have a pandas dataframe which has a column 'dob' (date of birth), I wish to compute the age based on today's date

I have used datetime module to invoke today, and subtract the 'dob' field from today, and divide this by 365 to get the age in years.

This is a rather crude approach I concede, I am looking for hints to do it more elegantly.

# -*- coding: utf-8 -*-

import pandas as pd
from datetime import datetime
today = datetime.today()

df = pd.read_csv(pathtocsvfile, parse_dates=['dob'])

df['age'] = df['dob'].apply(lambda x: (today - x).days // 365)

I believe the code is working as it is, however I am not sure how much leap years can influence the result.

And I am looking for an elegant way to do this as well.

Malyaj
  • 23
  • 1
  • 6
  • 1
    how about `df['dob'].dt.year-today().year` ? – anky May 16 '19 at 13:57
  • 1
    It depends. Do you want it to be the closest integer age or are we flooring or ceiling all of them? Is someone 1.25 and 1.85 years old considered the same age (1)? – ALollz May 16 '19 at 14:00
  • Possible duplicate of [Age from birthdate in python](https://stackoverflow.com/questions/2217488/age-from-birthdate-in-python) then `df['Age'] = df['dob'].apply(calculate_age)` – It_is_Chris May 16 '19 at 14:04
  • 1
    @anky_91 Yes, that's working, thanks ! – Malyaj May 16 '19 at 14:08

3 Answers3

6

I would suggest this if you want the age

df['age'] = df['dob'].apply(
               lambda x: today.year - x.year - 
               ((today.month, today.day) < (x.month, x.day)) 
               )

rather than taking days and dividing by 365 which is not always accurate and could lead to a wrong result.

This reflects the topic, which is discussed also here: Age from birthdate in python

PV8
  • 5,799
  • 7
  • 43
  • 87
ichafai
  • 331
  • 2
  • 9
  • Could you help me understand why have you subtracted ((today.month, today.day) < (x.month, x.day)) from today.year - x.year – Malyaj May 16 '19 at 14:12
  • 1
    Let's say I am born 2000-20-11 and today is 2019-05-16. Then if we just take today.year - x.year it will give 19 years old, but I am only 18 because my birthday in 2019 isnt reached yet. I substracted the boolean because it indicates if my birthday is reached during this year. – ichafai May 16 '19 at 14:16
2

To calculate age apply following algorithm:

  1. Represent dob as number in format YYYYMMDD
  2. Represent current date as number in format YYYYMMDD
  3. Subtract two numbers
  4. Divide result by 10000
  5. Take integer from this result

In code:

dob = '17-12-1965'
dob_date = datetime.strptime(dob, '%d-%m-%Y')
now_date = datetime.today()

age = int(
(now_date.year*10000 + now_date.month*100+now_date.day) - \
(dob_date.year*10000 + dob_date.month*100+dob_date.day) 
) / 10000)
Ajay Dabas
  • 1,404
  • 1
  • 5
  • 15
0

You are introducing inaccuracy by insisting on counting in years. Your purpose will just as well be served by an age in days, which you already have. Just drop the integer division by 365.

BoarGules
  • 16,440
  • 2
  • 27
  • 44