2

Sorry for the somewhat confusing title but I'm struggling with giving a better, brief description of my problem. I have a data frame with people, their birth dates and dates of events of multiple types.

library(dplyr)
library(lubridate)

data <- data.frame(PersonID = c(1,1,2,3,4,4,4,5,6,6),
                   dateBirth = c('2000-12-05', '2000-12-05', '1979-06-11', '1986-11-01', '2002-05-07', '2002-05-07', '2002-05-07', '2007-09-23', '1990-01-03', '1990-01-03'),
                   dateEventType1 = c('2022-01-17', NA, '2021-07-08', NA, NA, '2015-07-12', NA, NA, NA, NA),
                   dateEventType2 = c('2022-06-28', '2022-02-05', '2022-08-14', NA, NA, '2021-01-10', NA, '2022-06-08', '2021-05-01', '2021-04-19'),
                   dateEventType3 = c( '2021-01-19', '2020-12-23', '2021-06-15', '2020-06-21', '2020-10-18', '2020-10-08', '2020-10-14', '2020-07-19', '2021-01-16', '2021-08-08')) %>%
                   mutate_at(vars(2:5), ymd)

In my example, there are three types of events but in reality, there are about 20 and the number might grow so I don't like to "hard-code" a solution. Sometimes, one row has multiple events, sometimes there's only one event in a row but multiple rows per person. I want to know how old the person was at the time of each event. I'd wish for a function that looks at each column and puts an "Age at event" column next to it. I am comfortable with the lubridate-package so a solution using that and dplyr would be greatly preferred! This is the function I use currently:

ageAtEvent =
  year(as.period(interval(start = dateBirth,
                          end = dateEvent)))

Thank you all very much!

Klaus Peter
  • 125
  • 7

1 Answers1

2

I added missing commas to data and changed to ymd.

Here is a way to mutate across columns with a custom function

library(dplyr)
library(lubridate)

data <- data.frame(PersonID = c(1,1,2,3,4,4,4,5,6,6),
                   dateBirth = c('2000-12-05', '2000-12-05', '1979-06-11', '1986-11-01', '2002-05-07', '2002-05-07', '2002-05-07', '2007-09-23', '1990-01-03', '1990-01-03'),
                   dateEventType1 = c('2022-01-17', NA, '2021-07-08', NA, NA, '2015-07-12', NA, NA, NA, NA),
                   dateEventType2 = c('2022-06-28', '2022-02-05', '2022-08-14', NA, NA, '2021-01-10', NA, '2022-06-08', '2021-05-01', '2021-04-19') ,
                   dateEventType3 = c( '2021-01-19', '2020-12-23', '2021-06-15', '2020-06-21', '2020-10-18', '2020-10-08', '2020-10-14', '2020-07-19', '2021-01-16', '2021-08-08')) %>%
  mutate_at(vars(3:5), ymd)

ageAtEvent <- function(x, y) {year(as.period(interval(start = x, end = y)))}

data %>% 
  mutate(across(.cols = 3:5, 
                .fns = ~ ageAtEvent(dateBirth, .x), 
                .names = "age_{.col}"))

Update to interleave the results by column name:

data %>% 
  mutate(across(.cols=3:5, .fns= ~ ageAtEvent(dateBirth, .x), .names = "{.col}_age")) %>% 
  select(1:2, sort(colnames(.)))

dplyr: order columns alphabetically in R

M.Viking
  • 5,067
  • 4
  • 17
  • 33
  • 1
    Thank you very much, I corrected the code but failed to paste the corrected version … Your solution does what I was looking for. One minor detail. Is it possible that the newly created column is adjacent to the original column it's referring to? If not I can sort them manually, no biggie … May I tip you a coffee? – Klaus Peter Dec 20 '22 at 14:01
  • Oh, and one absolutely final request ;-) How would you go about creating age-group variables, i. e. that instead of "15" in the newly created variables "13 to 18" would appear? – Klaus Peter Dec 20 '22 at 14:42
  • Use the `cut()` function for that `cut(1:31, breaks = c(-Inf, 12, 18, 30, Inf), labels=c("0 to 12", "13 to 18", "19 to 30", "31 and over"))` – M.Viking Dec 20 '22 at 15:28
  • Didn't see your first request, it was in fact, prior to drinking coffee :) have a great day – M.Viking Dec 20 '22 at 18:20