R Function for Handling Survival Data in intervals

Question

Hello I am learning about survival analysis and I was curious if I could use the survival package on survival data of this form:

Here is some code to genereate data in this form

start_interval <-  seq(0, 13)
end_interval <-  seq(1, 14)
living_at_start <- round(seq(1000, 0, length.out = 14))
dead_in_interval <- c(abs(diff(living_at_start)), 0)
df <- data.frame(start_interval, end_interval, living_at_start, dead_in_interval)

From my use of the survival package so far it seems to have each individual be a survival time but I might be misreading the documentation of the Surv function. If survival will not work what other packages are out there for this type of data. If there is not a package or function to easily to estimate the survival function I can easily calculate the survival times myself with the following equation.

Please do not post photos of data or code! If you do, people who are willing to help you would have to type out all that text. Instead provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) P.S. Here is [a good overview on how to ask a good question](https://stackoverflow.com/help/how-to-ask) — dario, Oct 05 '21 at 13:54
For survival analyses using the survival package, you should have one observation per patient which is standard in the field — csgroen, Oct 05 '21 at 14:03
@dario added some code to help generate some data in this form — Victor Feagins, Oct 05 '21 at 14:19
@csgroen Would I need the transform the data to replicate the particular survival interval by number of deaths if I wanted to use the survival package? — Victor Feagins, Oct 05 '21 at 14:21
I think so... the package is not built with the data in the format you have in mind. However, you can probably manually plot a Kaplan-Meier curve with what you have. — csgroen, Oct 05 '21 at 15:31
@csgron I did the duplication. I answered the question with this work. It gets the same answer as doing the manual calculation but there are a few things I am uncertain of. — Victor Feagins, Oct 05 '21 at 15:42

score 2 · Accepted Answer · answered Oct 05 '21 at 15:38

Since the survival package need one observation per survival time we need to do some transformations. Using the simulated data.

Simulated Data:

library(survival)
start_interval <-  seq(0, 13)
end_interval <-  seq(1, 14)
living_at_start <- round(seq(1000, 0, length.out = 14))
dead_in_interval <- c(abs(diff(living_at_start)), 0)
df <- data.frame(start_interval, end_interval, living_at_start, dead_in_interval)

Transforming the data by duplicated by the number dead

duptimes <- df$dead_in_interval
rid <- rep(1:nrow(df), duptimes)
df.t <- df[rid,]

Using the Surv Function

test <- Surv(time = df.t$start_interval,
     time2 = df.t$end_interval,
     event = rep(1, nrow(df.t)), #Every Observation is a death
     type = "interval")

Fitting the survival curve

summary(survfit(test ~ 1))

Comparing with by hand calculation from original data

df$living_at_start/max(df$living_at_start)

They match.

Questions

When using the survfit function why is number of risk 1001 at time 0 when there is only 1000 people in the data?

length(test)

Regarding the different number at risk I asked the question on Cross Validated [link](https://stats.stackexchange.com/questions/547141/survival-package-why-does-interval-censoring-have-a-different-number-at-risk) — Victor Feagins, Oct 05 '21 at 16:34

R Function for Handling Survival Data in intervals

1 Answers1

Questions