13

I'm using the survival library. After computing the Kaplan-Meier estimator of a survival function:

km = survfit(Surv(time, flag) ~ 1)

I know how to compute percentiles:

quantile(km, probs = c(0.05,0.25,0.5,0.75,0.95))

But, how do I compute the mean survival time?

isekaijin
  • 19,076
  • 18
  • 85
  • 153

3 Answers3

25

Calculate Mean Survival Time

The mean survival time will in general depend on what value is chosen for the maximum survival time. You can get the restricted mean survival time with print(km, print.rmean=TRUE). By default, this assumes that the longest survival time is equal to the longest survival time in the data. You can set this to a different value by adding an rmean argument (e.g., print(km, print.rmean=TRUE, rmean=250)).

Extract Value of Mean Survival Time and Store in an Object

In response to your comment: I initially figured one could extract the mean survival time by looking at the object returned by print(km, print.rmean=TRUE), but it turns out that print.survfit doesn't return a list object but just returns text to the console.

Instead, I looked through the code of print.survfit (you can see the code by typing getAnywhere(print.survfit) in the console) to see where the mean survival time is calculated. It turns out that a function called survmean takes care of this, but it's not an exported function, meaning R won't recognize the function when you try to run it like a "normal" function. So, to access the function, you need to run the code below (where you need to set rmean explicitly):

survival:::survmean(km, rmean=60) 

You'll see that the function returns a list where the first element is a matrix with several named values, including the mean and the standard error of the mean. So, to extract, for example, the mean survival time, you would do:

survival:::survmean(km, rmean=60)[[1]]["*rmean"]

Details on How the Mean Survival Time is Calculated

The help for print.survfit provides details on the options and how the restricted mean is calculated:

?print.survfit 

The mean and its variance are based on a truncated estimator. That is, if the last observation(s) is not a death, then the survival curve estimate does not go to zero and the mean is undefined. There are four possible approaches to resolve this, which are selected by the rmean option. The first is to set the upper limit to a constant, e.g.,rmean=365. In this case the reported mean would be the expected number of days, out of the first 365, that would be experienced by each group. This is useful if interest focuses on a fixed period. Other options are "none" (no estimate), "common" and "individual". The "common" option uses the maximum time for all curves in the object as a common upper limit for the auc calculation. For the "individual"options the mean is computed as the area under each curve, over the range from 0 to the maximum observed time for that curve. Since the end point is random, values for different curves are not comparable and the printed standard errors are an underestimate as they do not take into account this random variation. This option is provided mainly for backwards compatability, as this estimate was the default (only) one in earlier releases of the code. Note that SAS (as of version 9.3) uses the integral up to the last event time of each individual curve; we consider this the worst of the choices and do not provide an option for that calculation.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Nice, thanks! Is there some way to directly store the restricted mean into a variable, or do I have to copy it from `print`'s output? – isekaijin Apr 02 '17 at 21:34
  • 2
    Thank you very much! I would upvote you another time, but I can't. :-| – isekaijin Apr 03 '17 at 00:29
  • 1
    Maybe the survival package was updated to achieve this since the question was asked, but today there's no need to use the "hidden" `survival:::survmean(km, rmean=60)`. Use just `summary(km)$table[,5:6]`, which gives you the RMST and its SE. The CI can be calculated using appropriate quantile of the normal distribution. – Bastian May 20 '22 at 01:55
  • @Bastian I think you mean `summary(km)$table[5:6]` which works fine for the RMST across the entire survival curve. However, if we want to extract a time-specific RMST, we still have no other solution but to use `survival:::survmean(km, rmean = 60)`. – Seanosapien Sep 12 '22 at 20:00
  • I'm looking to extract the same value, but I'm getting `Error: 'survmean' is not an exported object from 'namespace:survival'`, despite the fact that when I run `getAnywhere(print.survfit)` I can see `survmean`. Any help with this would be much appreciated. – gladys_c_hugh Oct 28 '22 at 11:22
  • In answer to my own question: can use `summary(km, rmean = 60)$table[,"rmean"]` - I guess this is what @Bastian was getting at, but missing the `rmean = 60` argument in the summary call. – gladys_c_hugh Oct 28 '22 at 13:17
2

Using the tail formula (and since our variable is non negative) you can calculate the mean as the integral from 0 to infinity of 1-CDF, which equals the integral of the Survival function.

If we replace a parametric Survival curve with a non parametric KM estimate, the survival curve goes only until the last time point in our dataset. From there on it "assumes" that the line continues straight. So we can use the tail formula in a "restricted" manner only until some cut-off point, which we can define (default is the last time point in our dataset).

You can calculate it using the print function, or manually:

print(km, print.rmean=TRUE) # print function
sum(diff(c(0,km$time))*c(1,km$surv[1:(length(km$surv)-1)])) # manually

I add 0 in the beginning of the time vector, and 1 at the beginning of the survival vector since they're not included. I only take the survival vector up to the last point, since that is the last chunk. This basically calculates the area-under the survival curve up to the last time point in your data.

If you set up a manual cut-off point after the last point, it will simply add that area; e.g., here:

print(km, print.rmean=TRUE, rmean=4) # gives out 1.247
print(km, print.rmean=TRUE, rmean=4+2) # gives out 1.560
1.247+2*min(km$surv) # gives out 1.560

If the cut-off value is below the last, it will only calculate the area-under the KM curve up to that point.

Maverick Meerkat
  • 5,737
  • 3
  • 47
  • 66
  • Thanks for your answer! I know the theory of how this works, but at the time I didn't know the specifics of how to do this in R, and I really wanted to make sure I wasn't missing an option in the `survival` library that would do the computation for me. – isekaijin May 01 '22 at 13:35
  • 2
    Welcome. I wanted to add this, especially for the manual calculations. – Maverick Meerkat May 02 '22 at 07:59
2

There's no need to use the "hidden" survival:::survmean(km, rmean=60).

Use just summary(km)$table[,5:6], which gives you the RMST and its SE. The CI can be calculated using appropriate quantile of the normal distribution.

Bastian
  • 313
  • 1
  • 13