I would like to calculate age based on date of birth and reference date. However, I have a large set of data and am working with data.table
.
I considered the following function, by user @MichaelChirico, available at: Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date.
library(data.table)
get_age <- function(birthdays, ref_dates){
x <- data.table(bday <- unclass(birthdays),
#rem: how many days has it been since the lapse of the
# most recent quadrennium since your birth?
rem = ((ref <- unclass(ref_dates)) - bday) %% 1461)
#cycle_type: which of the four years following your birthday
# was the one that had 366 days?
x[ , cycle_type :=
foverlaps(data.table(start = bdr <- bday %% 1461L, end = bdr),
#these intervals were calculated by hand;
# e.g., 59 is Feb. 28, 1970. I made the judgment
# call to say that those born on Feb. 29 don't
# have their "birthday" until the following March 1st.
data.table(start = c(0L, 59L, 424L, 790L, 1155L),
end = c(58L, 423L, 789L, 1154L, 1460L),
val = c(3L, 2L, 1L, 4L, 3L),
key = "start,end"))$val]
I4 <- diag(4L)[ , -4L] #for conciseness below
#The `by` approach might seem a little abstruse for those
# not familiar with `data.table`; see the edit history
# for a more palatable version (which is also slightly slower)
x[ , extra :=
foverlaps(data.table(start = rem, end = rem),
data.table(start = st <- cumsum(c(0L, rep(365L, 3L) +
I4[.BY[[1L]],])),
end = c(st[-1L] - 1L, 1461L),
int_yrs = 0:3, key = "start,end")
)[ , int_yrs + (i.start - start) / (end + 1L - start)], by = cycle_type]
#grand finale -- 4 years for every quadrennium, plus the fraction:
4L * ((ref - bday) %/% 1461L) + x$extra
}
The problem is that I have some empty entries for date of birth which causes me to get the following error message:
Caused by error in `foverlaps()`:
! NA values in data.table 'x' start column: 'start'. All rows with NA values in the range columns must be removed for foverlaps() to work.
I cannot miss these remarks. I would like to assign output NA to age when input in date of birth is NA. Any idea how I can do this?
For example:
test <- structure(list(city = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), date = c(10101992,
15101996, 1031997, 1061900, 13011870, 14071983, 11121995, NA,
11121995, 29021996), reference = c(20032023, 20032023, 20032023,
20032023, 20032023, 20032023, 20032023, 20032023, 20032023, 20032023
), date1 = structure(c(8318, 9784, 9921, -25416, -36512, 4942,
9475, NA, 9475, 9555), class = "Date"), reference1 = structure(c(19436,
19436, 19436, 19436, 19436, 19436, 19436, 19436, 19436, 19436
), class = "Date")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
test$date1 <- lubridate::dmy(test$date)
test$reference1 <- lubridate::dmy(test$reference)
test$age <- get_age(test$date1, test$reference1)
Gives me the error:
Error in foverlaps(data.table(start = bdr <- bday%%1461L, end = bdr), :
NA values in data.table 'x' start column: 'start'. All rows with NA values in the range columns must be removed for foverlaps() to work.
I would like to get:
structure(list(city = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), date = c(10101992,
15101996, 1031997, 1061900, 13011870, 14071983, 11121995, NA,
11121995, 29021996), reference = c(20032023, 20032023, 20032023,
20032023, 20032023, 20032023, 20032023, 20032023, 20032023, 20032023
), date1 = c("1992-10-10", "1996-10-15", "1997-03-01", "1900-06-01",
"1870-01-13", "1983-07-14", "1995-12-11", NA, "1995-12-11", "1996-02-29"
), reference1 = c("2023-03-20", "2023-03-20", "2023-03-20", "2023-03-20",
"2023-03-20", "2023-03-20", "2023-03-20", "2023-03-20", "2023-03-20",
"2023-03-20"), age = c(30.441095890411, 26.427397260274, 26.051912568306,
122.8, 153.178082191781, 39.6803278688525, 27.2704918032787,
NA, 27.2704918032787, 27.051912568306)), row.names = c(NA, 10L
), class = "data.frame")