-1

I have a longitudinal dataset with participant age and a variable to show the age at which a participant experienced an event (0/1) as follows.

id  age  event
 1    0     0
 1    1     0
 1    2     0
 1    3     0
 1    4     0
 1    5     0
 2    0     0
 2    1     1
 2    2     1
 2    3     1
 2    4     1
 2    5     1 
 3    0     0
 3    1     0
 3    2     0
 3    3     1
 3    4     1
 3    5     1

Based on whether the event never happened (0) or it happened before the age of 2 yrs (1) or after the age 2 yrs (2), I want to generate a new variable called timing and assign each participant to a group (0,1,2) as follows:

id  age  event  timing  
 1    0     0      0
 1    1     0      0
 1    2     0      0
 1    3     0      0
 1    4     0      0
 1    5     0      0
 2    0     0      1
 2    1     1      1
 2    2     1      1
 2    3     1      1
 2    4     1      1
 2    5     1      1
 3    0     0      2
 3    1     0      2
 3    2     0      2
 3    3     1      2
 3    4     1      2
 3    5     1      2

I don't have great coding skills and would really appreciate if anyone could assist.

Edward
  • 10,360
  • 2
  • 11
  • 26
LukN
  • 1
  • 1

2 Answers2

0

1.Create a minimal reproducible example:

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L),
                     age = c(0L, 1L, 2L, 3L, 4L,5L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L),
                     event = c(0L,0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,1L)),
                row.names = c(NA, -18L), class = "data.frame")

2.A solution using dplyr:

library(dplyr)

df %>% 
  left_join(df %>% 
              mutate(timing = if_else(age < 2 & event == 1, 1,
                                      if_else(age >= 2 & event == 1, 2, 0))) %>% 
              group_by(id) %>%
              summarize(timing = ifelse(1 %in% timing, 1,
                                        ifelse(2 %in% timing, 2, 0)))
  )

This returns:

   id age event timing
1   1   0     0      0
2   1   1     0      0
3   1   2     0      0
4   1   3     0      0
5   1   4     0      0
6   1   5     0      0
7   2   0     0      1
8   2   1     1      1
9   2   2     1      1
10  2   3     1      1
11  2   4     1      1
12  2   5     1      1
13  3   0     0      2
14  3   1     0      2
15  3   2     0      2
16  3   3     1      2
17  3   4     1      2
18  3   5     1      2
dario
  • 6,415
  • 2
  • 12
  • 26
  • Thank you all for your responses. I found the solution by @dario very helpful; thank you! – LukN Feb 14 '20 at 11:25
0

The conditions you've specified clash with the expected output. So it's hard to know for sure what you need. Intuitively it would make sense that the value in timing should always be 0 whatever the age of the participant if the event never happened. If that is correct, then the following nested ifelse clauses do create the new variable:

df$timing <- ifelse(df$event==0, 0,
                    ifelse(df$event==1 & df$age==1, 1, 2))

Result:

df
   id age event timing
1   1   0     0      0
2   1   1     0      0
3   1   2     0      0
4   1   3     0      0
5   1   4     0      0
6   1   5     0      0
7   2   0     0      0
8   2   1     1      1
9   2   2     1      2
10  2   3     1      2
11  2   4     1      2
12  2   5     1      2
13  3   0     0      0
14  3   1     0      0
15  3   2     0      0
16  3   3     1      2
17  3   4     1      2
18  3   5     1      2
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Hi @Chris Ruehlemann. Thank you for your response. The only issue with your solution is that for the same participant, the code assigns to different groups i.e. 0 or 1 or 2. What I needed is to assign all observations for a participant as either 0 or 1 or 2 based of the first appearance of the event. – LukN Feb 14 '20 at 11:28