-2
     patient.id       date type 
5           1053 2006/12/14   DX    
2           1053  2007/4/21  HSCT

1           1053  2007/5/29   FU  
6           1053  2007/7/20   FU  
3           1053  2007/9/20   FU   
4           1053 2007/11/18   D1  
7           1138   2009/9/3   DX  
13          1138   2010/2/3 HSCT  
23          1138  2010/3/11   FU  
10          1138   2010/6/6   FU   
9           1138  2010/8/31   FU   
15          1138  2010/11/5   FU   
11          1138   2011/2/7   FU   
16          1138  2011/5/15   FU   
17          1138  2011/7/18   FU   
14          1138  2011/9/21   FU   
24          1138 2011/12/13   FU   
19          1138  2012/3/13   FU   
25          1138  2012/5/11   D1
BENY
  • 317,841
  • 20
  • 164
  • 234

2 Answers2

1

An R base solution:

> lapply(with(dat, split(date, patient.id)), function(x) diff(range(x)))
$`1053`
Time difference of 339 days

$`1138`
Time difference of 981 days
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • if the last type of each patient is HSCT or D2, it‘s right censored,marked as 1, type D1 means death marked as 0. how to generate a column of such data – WENWEN LI Mar 23 '19 at 21:20
0

Use dplyr to convert to date format, then group by patient and calculate max(date) - min(date).

library(dplyr)
mydata %>% 
  mutate(date = as.Date(date, "%Y/%m/%d")) %>% 
  group_by(patient.id) %>% 
  summarise(Survival = as.numeric(max(date) - min(date)))

Result:

  patient.id Survival
       <int>    <dbl>
1       1053      339
2       1138      981
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • how to calculate the days from HSCT type to the last day's type, because DX means diagnosis, HSCT means transplant. – WENWEN LI Mar 23 '19 at 20:56
  • From the second to the last data of each patient, is the patient's survival time, (HSCT) means Hematopoietic stem cell transplantation, DX means diagnosis – WENWEN LI Mar 23 '19 at 21:00