Assign value based on age of occurrence of event in R

Question

I have a longitudinal dataset with participant age and a variable to show the age at which a participant experienced an event (0/1) as follows.

id  age  event
 1    0     0
 1    1     0
 1    2     0
 1    3     0
 1    4     0
 1    5     0
 2    0     0
 2    1     1
 2    2     1
 2    3     1
 2    4     1
 2    5     1 
 3    0     0
 3    1     0
 3    2     0
 3    3     1
 3    4     1
 3    5     1

Based on whether the event never happened (0) or it happened before the age of 2 yrs (1) or after the age 2 yrs (2), I want to generate a new variable called timing and assign each participant to a group (0,1,2) as follows:

id  age  event  timing  
 1    0     0      0
 1    1     0      0
 1    2     0      0
 1    3     0      0
 1    4     0      0
 1    5     0      0
 2    0     0      1
 2    1     1      1
 2    2     1      1
 2    3     1      1
 2    4     1      1
 2    5     1      1
 3    0     0      2
 3    1     0      2
 3    2     0      2
 3    3     1      2
 3    4     1      2
 3    5     1      2

I don't have great coding skills and would really appreciate if anyone could assist.

If the event never happened, shouldn't the value in `timing` be 0? — Chris Ruehlemann, Feb 11 '20 at 13:25

score 0 · Answer 1 · answered Feb 11 '20 at 12:46

1.Create a minimal reproducible example:

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L),
                     age = c(0L, 1L, 2L, 3L, 4L,5L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L),
                     event = c(0L,0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,1L)),
                row.names = c(NA, -18L), class = "data.frame")

2.A solution using dplyr:

library(dplyr)

df %>% 
  left_join(df %>% 
              mutate(timing = if_else(age < 2 & event == 1, 1,
                                      if_else(age >= 2 & event == 1, 2, 0))) %>% 
              group_by(id) %>%
              summarize(timing = ifelse(1 %in% timing, 1,
                                        ifelse(2 %in% timing, 2, 0)))
  )

This returns:

   id age event timing
1   1   0     0      0
2   1   1     0      0
3   1   2     0      0
4   1   3     0      0
5   1   4     0      0
6   1   5     0      0
7   2   0     0      1
8   2   1     1      1
9   2   2     1      1
10  2   3     1      1
11  2   4     1      1
12  2   5     1      1
13  3   0     0      2
14  3   1     0      2
15  3   2     0      2
16  3   3     1      2
17  3   4     1      2
18  3   5     1      2

Thank you all for your responses. I found the solution by @dario very helpful; thank you! — LukN, Feb 14 '20 at 11:25

Chris Ruehlemann · Answer 2 · 2020-02-11T13:39:19.890

The conditions you've specified clash with the expected output. So it's hard to know for sure what you need. Intuitively it would make sense that the value in timing should always be 0 whatever the age of the participant if the event never happened. If that is correct, then the following nested ifelse clauses do create the new variable:

df$timing <- ifelse(df$event==0, 0,
                    ifelse(df$event==1 & df$age==1, 1, 2))

Result:

df
   id age event timing
1   1   0     0      0
2   1   1     0      0
3   1   2     0      0
4   1   3     0      0
5   1   4     0      0
6   1   5     0      0
7   2   0     0      0
8   2   1     1      1
9   2   2     1      2
10  2   3     1      2
11  2   4     1      2
12  2   5     1      2
13  3   0     0      0
14  3   1     0      0
15  3   2     0      0
16  3   3     1      2
17  3   4     1      2
18  3   5     1      2

Hi @Chris Ruehlemann. Thank you for your response. The only issue with your solution is that for the same participant, the code assigns to different groups i.e. 0 or 1 or 2. What I needed is to assign all observations for a participant as either 0 or 1 or 2 based of the first appearance of the event. — LukN, Feb 14 '20 at 11:28

Assign value based on age of occurrence of event in R

2 Answers2