5

I would like to calculate age based on birth date.

If I use lubridate, I would just run the following as in Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date

as.period(new_interval(start = birthdate, end = givendate))$year

However, when I tried to use mutate in dplyr to create the new variable, I ran into an error.

library(dplyr); library(lubridate)

birthdate <- ymd(c(NA, "1978-12-31", "1979-01-01", "1962-12-30"))
givendate <- ymd(c(NA, "2015-12-31", "2015-12-31", NA))

df <- data.frame(
    birthdate = birthdate,
    givendate = givendate)

The following works though it gives all the date and time values. i.e. year, month, day, hour, minute and second.

df<-df %>% mutate(age=as.period(interval(start = birthdate, end = givendate)))

# df
#    birthdate  givendate                  age
# 1       <NA>       <NA>                 <NA>
# 2 1978-12-31 2015-12-31   37y 0m 0d 0H 0M 0S
# 3 1979-01-01 2015-12-31 36y 11m 30d 0H 0M 0S
# 4 1962-12-30       <NA>                 <NA>

The following does not work:

df<-df %>% 
       mutate(age=as.period(interval(start = birthdate, end = givendate))$year)

It gives an error:

Error in mutate_impl(.data, dots) : invalid subscript type 'closure'

I thought it might be because of the missing values. So, I tried:

df<-df %>% 
   mutate(age=as.period(interval(start = birthdate, end = givendate))) %>% 
   mutate(age=if_else(!is.na(age),age$year,age))

It also gives an error:

Error in mutate_impl(.data, dots) : object 'age' not found

Community
  • 1
  • 1
HNSKD
  • 1,614
  • 2
  • 14
  • 25
  • @akrun When I apply the first `mutate`, I would already have `age` variable in the dataset. I apply `$year` on age as I thought I could extract the `year` of the period. – HNSKD Jan 18 '17 at 08:21
  • 'age' is having 'period' class which may not be supported within `mutate` – akrun Jan 18 '17 at 08:22

3 Answers3

7

Within lubridate,

  • Period is an S4 class with a slot "year"
  • year is an S3 class object with a method to extract the year slot from a period object.

see https://github.com/hadley/lubridate/blob/master/R/accessors-year.r) an accessor function to extract the year component.

Therefore, the following will work

df %>% mutate(age = year(as.period(interval(start = birthdate, end = givendate))))
mnel
  • 113,303
  • 27
  • 265
  • 254
4

We can use year function from lubridate to get the difference between two dates in years.

library(dplyr); library(lubridate)
df %>% mutate(age = year(givendate) - year(birthdate))

#   birthdate  givendate age
#1       <NA>       <NA>  NA
#2 1978-12-31 2015-12-31  37
#3 1979-01-01 2015-12-31  36
#4 1962-12-30       <NA>  NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 5
    I don't think that using year alone is accurate in calculating age. If a person's birth date is 1978-12-31 and the given date is 2015-12-30, he would still be 36 since his birthday has not passed yet. – HNSKD Jan 18 '17 at 08:45
  • 1
    @HNSKD ohh..yes! Because `year` only extract the year part of the Date. The most straightforward approach would be as suggested by @Spacedman `as.period(interval(start = df$birthdate, end = df$givendate))$year` – Ronak Shah Jan 18 '17 at 09:48
1

We can use do

df %>%
   mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
   do(data.frame(.[setdiff(names(.), "age")], 
       age = ifelse(!is.na(.$age), .$age$year, .$age)))
#    birthdate  givendate age
#1       <NA>       <NA>  NA
#2 1978-12-31 2015-12-31  37
#3 1979-01-01 2015-12-31  36
#4 1962-12-30       <NA>  NA

As the as.period comes with period class, we may need S4 methods to extract it

df %>% 
    mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
   .$age %>%
   .@year %>%
    mutate(df, age = .)
#  birthdate  givendate age
#1       <NA>       <NA>  NA
#2 1978-12-31 2015-12-31  37
#3 1979-01-01 2015-12-31  36
#4 1962-12-30       <NA>  NA
akrun
  • 874,273
  • 37
  • 540
  • 662