0

I am trying to calculate age from two date columns. First, I convert to NA any invalid date of births (dob). Next, I try to calculate age using lubridate (solution from: https://stackoverflow.com/a/41730322/8772229) but get an error message. Any advice on what is going wrong?

Data:

df <- data.frame(dob=as.Date(c("2020-09-26", "2017-12-01", NA)), today=as.Date(c("2020-09-25", "2020-09-25", "2020-09-25")))
df
         dob      today
1 2020-09-26 2020-09-25
2 2017-12-01 2020-09-25
3       <NA> 2020-09-25

Code:

library(lubridate)
df %>% 
  mutate(
    # convert non-plausible dates to NA
  dob= case_when((dob>today)~as.Date(NA_character_), TRUE~as.Date(dob)),
  # calculate age
  age=year(as.period(interval(start = dob, end = today))))

Message:

Error in FUN(X[[i]], ...) : subscript out of bounds
EML
  • 615
  • 4
  • 14

2 Answers2

1

It gives me a different error because of trying to extract year value from a NA period. You can use time_length function from lubridate to get difference in years.

library(dplyr)
library(lubridate)

df %>% 
  mutate(dob= replace(dob, dob > today, NA),
         age= time_length(today-dob, 'years'))

#         dob      today      age
#1       <NA> 2020-09-25       NA
#2 2017-12-01 2020-09-25 2.817248
#3       <NA> 2020-09-25       NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

To get the difference between two dates use

as.vector(today - dob)

You can then divide by 365.25 to get the number of years.

if you use

today - dod you get the answer as a factor.

Try this code on your data:

Age <- as.vector(df$today - df$dob) #number of days Age/365.25

Job Nmadu
  • 9
  • 2