0

I'm currently experiencing a problem with my code. Being a beginner, I can't really find a solution.

library(data.table)
id<- c(rep(1,5), rep(2,5),rep(3,5))
time <-c (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
death <-c(0,1,0,1,1,0,0,1,1,0,0,1,1,0,1)
table<-data.table(id, time, death)

And I would like to obtain a last column that looks like this :

death_ideal<-c(0,1,1,1,1,0,0,1,1,1,0,1,1,1,1)

The idea is that when an "id" is death, it stays dead. For example, id ==1 dies in time 2, so it can not be "un-dead" in time 3. I have tried the code below. It is a bit complicated and it doesn't work (the error obtained is below).

j<- min(table$id)
while(j<=max(table$id)){
  i<-min(table$time[which(table$id==j)])
  while (i<=max(table$time[which(table$id==j)])){
    if (table$death[which(table$time == i)]==1){table$test[which(table$time == i)]<-1}
    else {if (table$test[which(table$time == i-1)]==1)
    {table$test[which(table$time == i)]<-1}
      else { table$test<-0 }}
    i = i+1}
  j = j+1}

Errors that I get :

Error in if (table$test[which(table$time == i - 1)] == 1) { : 
 argument is of length zero
In addition: Warning message:
In if (table$death[which(table$time == i)] == 1) { :
  the condition has length > 1 and only the first element will be used

It seems that it is a rather common error but, despite that, I can't make my code work. Thank you very much to the person that can help me !

MrFlick
  • 195,160
  • 17
  • 277
  • 295
TheMade
  • 25
  • 4

2 Answers2

2

There are lot of functions in R which can help you with grouped data manipulation like this. You don't need an explicit for or while loop.

Since death_ideal remains 1 once it is 1, here you can take cummax which is cumulative maximum for each id.

This can be done in data.table.

library(data.table)
table[, death_ideal := cummax(death), id]
table

#    id time death death_ideal
# 1:  1    1     0           0
# 2:  1    2     1           1
# 3:  1    3     0           1
# 4:  1    4     1           1
# 5:  1    5     1           1
# 6:  2    1     0           0
# 7:  2    2     0           0
# 8:  2    3     1           1
# 9:  2    4     1           1
#10:  2    5     0           1
#11:  3    1     0           0
#12:  3    2     1           1
#13:  3    3     1           1
#14:  3    4     0           1
#15:  3    5     1           1

In base R,

table$death_ideal <- with(table, ave(death, id, FUN = cummax))

Or dplyr

table %>% group_by(id) %>% mutate(death_ideal = cummax(death))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much! I used your second solution (in base R) which was the more efficient for me. But there is one little detail that I don't understand about the formula. How does R know that it must look at the time column because in your formula the column time is not present. Thanks again for your help. – TheMade Jul 21 '20 at 07:57
  • It doesn't look at the `time` column in my answer. I am assuming the `time` column is always sorted as shown in the example. If it is not you might need to `order` the data first before applying the answer. – Ronak Shah Jul 21 '20 at 08:01
  • Understood, I have done that then. Thank your very much for your help, it seems to work just fine ! – TheMade Jul 21 '20 at 09:03
0

Here is an base R option with ave

> with(table,as.numeric(ave(death,id,FUN = cumsum)>0))
 [1] 0 1 1 1 1 0 0 1 1 1 0 1 1 1 1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81