0

Is it possible to split episode by a given variable in survival analysis in R, similar to in STATA using stsplit in the following way: stsplit var, at(0) after(time=time)?

I am aware that the survival package allows one to split episode by given cut points such as c(0,5,10,15) in survSplit, but if a variable, say time of divorce, differs by each individual, then providing cutpoints for each individual would be impossible, and the split would have to be based on the value of a variable (say graduation, or divorce, or job termination).

Is anyone aware of a package or know a resource I might be able to tap into?

elliezee
  • 113
  • 4
  • My guess is that this would be very achievable in R. Please provide us with a small, reproducible code snippet that we can copy and paste to better understand the issue and test possible solutions. You can share datasets with `dput(YOUR_DATASET)` or smaller samples with `dput(head(YOUR_DATASET))`. (See [this answer](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) for detailed instructions.) – ktiu Jun 07 '21 at 16:29
  • I think I got it! I will try to answer my own question. Hopefully people comment/correct. – elliezee Jun 15 '21 at 16:06

2 Answers2

0

After some poking around, I think tmerge() in the survival package can achieve what stsplit var can do, which is to split episodes not just by a given cut points (same for all observations), but by when an event occurs for an individual.

This is the only way I knew how to split data

id<-c(1,2,3)
age<-c(19,20,29)
job<-c(1,1,0)
time<-age-16 ## create time since age 16 ## 

data<-data.frame(id,age,job,time)

  id age job time
1  1  19   1    3
2  2  20   1    4
3  3  29   0   13

## simple split by time ## 
## 0 to up 2 years, 2-5 years, 5+ years ## 

data2<-survSplit(data,cut=c(0,2,5),end="time",start="start",
                event="job")

  id age start time job
1  1  19     0    2   0
2  1  19     2    3   1
3  2  20     0    2   0
4  2  20     2    4   1
5  3  29     0    2   0
6  3  29     2    5   0
7  3  29     5   13   0

However, if I want to split by a certain variable, such as when each individuals finished school, each person might have a different cut point (finished school at different ages).

## split by time dependent variable (age finished school) ##
d1<-data.frame(id,age,time,job)

scend<-c(17,21,24)-16

d2<-data.frame(id,scend)

## create start/stop time ## 
base<-tmerge(d1,d1,id=id,tstop=time)
## create time-dependent covariate ## 
s1<-tmerge(base,d2,id=id,
           finish=tdc(scend))

  id age time job tstart tstop finish
1  1  19    3   1      0     1      0
2  1  19    3   1      1     3      1
3  2  20    4   1      0     4      0
4  3  29   13   0      0     8      0
5  3  29   13   0      8    13      1

I think tmerge() is more or less comparable with stsplit function in STATA.

elliezee
  • 113
  • 4
0

Perhaps Epi package is what you are looking for. It offers multiple ways to cut/split the follow-up time using the Lesix objects. Here is the documentation of cutLesix().

Zaw
  • 1,434
  • 7
  • 15