8

I have a Cox proportional hazards model set up using the following code in R that predicts mortality. Covariates A, B and C are added simply to avoid confounding (i.e. age, sex, race) but we are really interested in the predictor X. X is a continuous variable.

cox.model <- coxph(Surv(time, dead) ~ A + B + C + X, data = df)

Now, I'm having troubles plotting a Kaplan-Meier curve for this. I've been searching on how to create this figure but I haven't had much luck. I'm not sure if plotting a Kaplan-Meier for a Cox model is possible? Does the Kaplan-Meier adjust for my covariates or does it not need them?

What I did try is below, but I've been told this isn't right.

plot(survfit(cox.model), xlab = 'Time (years)', ylab = 'Survival Probabilities')

I also tried to plot a figure that shows cumulative hazard of mortality. I don't know if I'm doing it right since I've tried it a few different ways and get different results. Ideally, I would like to plot two lines, one that shows the risk of mortality for the 75th percentile of X and one that shows the 25th percentile of X. How can I do this?

I could list everything else I've tried, but I don't want to confuse anyone!

Many thanks.

Hims
  • 103
  • 1
  • 1
  • 5
  • 1
    It's not that the "KM curves adjust for covariates" but rather that one can construct predicted step-function survival curves from model fits. Most people would use the term KM curve to refer to unadjusted survival curves. You also need to specify all the variables to make a prediction. See the coded example below. – IRTFM Aug 19 '15 at 20:06

3 Answers3

7

Here is an example taken from this paper.

url <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
Rossi <- read.table(url, header=TRUE)
Rossi[1:5, 1:10]

#   week arrest fin age  race wexp         mar paro prio educ
# 1   20      1  no  27 black   no not married  yes    3    3
# 2   17      1  no  18 black   no not married  yes    8    4
# 3   25      1  no  19 other  yes not married  yes   13    3
# 4   52      0 yes  23 black  yes     married  yes    1    5
# 5   52      0  no  19 other  yes not married  yes    3    3

mod.allison <- coxph(Surv(week, arrest) ~ 
                        fin + age + race + wexp + mar + paro + prio,
                        data=Rossi)
mod.allison

# Call:
# coxph(formula = Surv(week, arrest) ~ fin + age + race + wexp + 
#    mar + paro + prio, data = Rossi)
#
#
#                   coef exp(coef) se(coef)      z      p
# finyes         -0.3794     0.684   0.1914 -1.983 0.0470
# age            -0.0574     0.944   0.0220 -2.611 0.0090
# raceother      -0.3139     0.731   0.3080 -1.019 0.3100 
# wexpyes        -0.1498     0.861   0.2122 -0.706 0.4800
# marnot married  0.4337     1.543   0.3819  1.136 0.2600
# paroyes        -0.0849     0.919   0.1958 -0.434 0.6600
# prio            0.0915     1.096   0.0286  3.194 0.0014
#
# Likelihood ratio test=33.3  on 7 df, p=2.36e-05  n= 432, number of events= 114    

Note that the model uses fin, age, race, wexp, mar, paro, prio to predict arrest. As mentioned in this document the survfit() function uses the Kaplan-Meier estimate for the survival rate.

plot(survfit(mod.allison), ylim=c(0.7, 1), xlab="Weeks",
     ylab="Proportion Not Rearrested")

survival estimate plot

We get a plot (with a 95% confidence interval) for the survival rate. For the cumulative hazard rate you can do

# plot(survfit(mod.allison)$cumhaz)

but this doesn't give confidence intervals. However, no worries! We know that H(t) = -ln(S(t)) and we have confidence intervals for S(t). All we need to do is

sfit <- survfit(mod.allison)
cumhaz.upper <- -log(sfit$upper)
cumhaz.lower <- -log(sfit$lower)
cumhaz <- sfit$cumhaz # same as -log(sfit$surv)

Then just plot these

plot(cumhaz, xlab="weeks ahead", ylab="cumulative hazard",
     ylim=c(min(cumhaz.lower), max(cumhaz.upper)))
lines(cumhaz.lower)
lines(cumhaz.upper)

cumhaz

You'll want to use survfit(..., conf.int=0.50) to get bands for 75% and 25% instead of 97.5% and 2.5%.

nathanesau
  • 1,681
  • 16
  • 27
  • I'm not sure that setting the conf.int = 0.50 is the same as plotting survival curve estimates for the 25th and 75th percentile of X values. I thought I would have had to use survfit.coxph function for the Kaplan-Meier curve, not sure about cumulative hazard though. – Hims Aug 18 '15 at 20:08
  • 1
    Mostly helpful but the last sentence is just wrong, and since that was the thrust of the question really needs to be fixed! – IRTFM Aug 19 '15 at 18:17
  • @Hims regarding your comment about ``survfit.coxph`` - based on the way that ``R`` deals with class objects, ``survfit.coxph`` is being called when I call ``survfit`` – nathanesau Aug 19 '15 at 18:27
  • I'm not sure if its possible - the ``survfit`` function estimates the survival function and derives the ``cumhaz`` prediction using this survival function (done with the Kaplan-Meier method). So we can't get separate the ``cumhaz`` prediction into separate hazards. – nathanesau Aug 19 '15 at 18:49
4

The request for estimated survival curve at the 25th and 75th percentiles for X first requires determining those percentiles and specifying values for all the other covariates in a dataframe to be used as newdata argument to survfit.:

Can use the data suggested by other resondent from Fox's website, although on my machine it required building an url-object:

 url <- url("http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt")
 Rossi <- read.table(url, header=TRUE)

It's probably not the best example for this wquestion but it does have a numeric variable that we can calculate the quartiles:

> summary(Rossi$prio)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.000   2.000   2.984   4.000  18.000 

So this would be the model fit and survfit calls:

 mod.allison <- coxph(Surv(week, arrest) ~ 
                         fin + age + race + prio ,
                         data=Rossi)
 prio.fit <- survfit(mod.allison, 
                     newdata= data.frame(fin="yes", age=30, race="black", prio=c(1,4) ))
 plot(prio.fit, col=c("red","blue"))

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • is it possible to do a log-rank test between the red and blue curve? – jiggunjer Nov 03 '20 at 08:21
  • You could get a log-rank test of the significance of the `prio` variable in the `mod.allison` model under the conditions of the data on which it was built and estimated. Is that what you want? – IRTFM Nov 03 '20 at 16:58
  • I think so. If I do summary() of the model it returns a single logrank result for all the coeffients. I want to test between the 2 prio quartiles instead. – jiggunjer Nov 04 '20 at 01:53
  • I don’t think there is a method to do a statistical test as you describe it. – IRTFM Nov 04 '20 at 16:31
  • Looking at this again I think what would suffice would be to compare the two models with and without the prio covariate and take the difference in deviance and compare to Chi-square statistic with one degree of freedom. It is a log-likelihood test. – IRTFM Jan 24 '23 at 22:15
0

Setting the values of the confounders to a fixed value and plotting the predicted survival probabilities at multiple points in time for given values of X (as @IRTFM suggested in his answer), results in a conditional effect estimate. That is not what a standard Kaplan-Meier estimator is used for and I don't think that is what the original poster wanted. Usually we are interested in average causal effects. In other words: What would the survival probability be if X had been set to some specific value x in the entire sample?

We can obtain this probability using the cox-model that was fit plus g-computation. In g-computation, we set the value of X to x in the entire sample and then use the cox model to predict the survival probability at t for each individual, using their observed covariate values in the process. Then we simply take the average of those predictions to obtain our final estimate. By repeating this process for a range of points in time and a range of possible values for X, we obtain a three-dimensional survival surface. We can then visualize this surface using color scales.

This can be done using the contsurvplot R-package I developed, as discussed in this previous answer: Converting survival analysis by a continuous variable to categorical or in the documentation of the package. More information about this strategy in general can be found in the preprint version of my article on this topic: https://arxiv.org/pdf/2208.04644.pdf

Denzo
  • 240
  • 1
  • 7