2

I'm new to stackoverflow and here is my 1st question :) I was wondering if there was a similar function as "geom_smooth" but for plot that were generated using "ggsurvplot". Here is an example of what I want to do, using the R "ovarian" dataset:

Create a "survival object":

library(survival)
surv_object <- Surv(time = ovarian$futime, event = ovarian$fustat)

Create a "survival curve" from a previously fitted model:

fit1 <- survfit(surv_object ~ rx, data = ovarian)

plot the "survfit" object:

library(survminer)    
ggsurvplot(fit1, data = ovarian, pval = TRUE)

Thanks a lot, Valérian

Lucas Morin
  • 373
  • 2
  • 11
  • 35
ValZee
  • 33
  • 1
  • 8
  • Traditional survival curves are always step plots. It would be weird to see them as a smooth curve. Are you really sure that's what you want? If so, what technique do you want to use to "smooth" the curve? When asking for help you should always include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data that we can use to test and verify possible solutions. – MrFlick Mar 19 '19 at 15:18
  • Hi MrFlick thanks for your answer. Yes, I thought smoothing the curve would show better the (significant) treatment difference I'm trying to illustrate. Regarding the smoothing technique: I don't really know, I guess method="loess" will do. Oups, regarding the reproducile example I thought I had done enough by providing an example that uses a R dataset (i.e ovarian). I thus just produced an example to illustrate my issue and did not have any possible solutions to verify, but on which it would be possible to apply the smoothing function. Should I do it differently next time? Thanks a lot! – ValZee Mar 19 '19 at 16:57
  • I didn't realize "ovarian" was a built in dataset from the survival package. Sorry, that is a good data set to use. – MrFlick Mar 19 '19 at 19:02
  • No prob, thanks a lot for your help – ValZee Mar 20 '19 at 08:10
  • `Error in ggsurvplot(fit1, data = ovarian, pval = TRUE) : could not find function "ggsurvplot"` – IRTFM Mar 24 '19 at 03:10
  • Hi 42, the ggsurvplot is in the package survminer. Sorry, I forgot to mention it. – ValZee Mar 25 '19 at 07:46
  • @MrFlick : this is not the case for every industry. See for exemple [2018 Annual Global Corporate Default And Rating Transition Study](https://www.spratings.com/documents/20184/774196/2018AnnualGlobalCorporateDefaultAndRatingTransitionStudy.pdf) p9 cumulative default rate by rating, which is 1 - survival function. – Lucas Morin Sep 25 '19 at 15:45
  • @lcrmorin. What chart are you referring to in this document? The question still remains, exactly what smoothing technique do you want to use? How do you want to be able to interpret the resulting plot? Do you want to make a stronger modeling assumption? – MrFlick Sep 25 '19 at 17:40
  • The cumulative default rate by time horizon p9. I don't think it need a stronger modelling assumptions, just a simple way to plot diagonal lines between points and not a step curve with ggsurvplot. – Lucas Morin Sep 26 '19 at 07:42

2 Answers2

2

A "simple way to plot diagonal lines between points" can easily be extracted from the "survfit" object where we are interested in time, surv and the strata. Both strata have the same length, so we simply repeat a strata id length(fit1$surv) / 2 each.

# survfit object
library(survival)
fit1 <- survfit(Surv(time=ovarian$futime, event=ovarian$fustat) ~ rx, data=ovarian)

# extraction
d1 <- with(fit1, data.frame(time, surv, strata=rep(1:2, each=length(surv) / 2)))

Then we may plot the estimates for each strata separately.

cols <- c("red", "blue")
plot(d1$time, d1$surv, type="n", ylim=0:1)
sapply(1:2, function(x) with(d1[d1$strata == x, ], lines(time, surv, type="l", col=cols[x])))
legend("topright", legend=c("rx1", "rx2"), lty=1, col=cols, title="Strata")

enter image description here

Or using ggplot2, something like this:

ggplot2::ggplot(d1, aes(x=time, y=surv, group=strata, col=strata)) +
  geom_line() +
  ylim(0:1) + 
  scale_colour_identity()

enter image description here

Note that this just scopes the programming issue, there probably need some discussions about smoothing-assumptions to be done, e.g. on Cross Validated.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

You don't want to smooth your data, but to plot curves instead of lines.

Full credit for the method goes to @Z.Lin (https://stackoverflow.com/a/54900769/9406040). Thanks also to @jay.sf for extracting the data from the survfit object.

Data

library(tidyverse)
library(survival)

surv_object <- Surv(time = ovarian$futime, event = ovarian$fustat)
fit1 <- survfit(surv_object ~ rx, data = ovarian)

d1 <- with(fit1, data.frame(time, surv,
                            strata = as.factor(rep(1:2, each=length(fit1$surv) / 2))))

d2 <- d1 %>%
  group_by(strata) %>%
  summarise(x = list(spline(time, surv, n = 200, method = "natural")[["x"]]),
            y = list(spline(time, surv, n = 200, method = "natural")[["y"]])) %>%
  tidyr::unnest(cols = c("x", "y"))

Plot

ggplot() + 
  geom_point(data = d1,
             aes(time, surv, color = strata)) +
  geom_line(data = d2,
            aes(x, y, color = strata))

1


Comparison with smoothing

ggplot() + 
  geom_point(data = d1,
             aes(time, surv, color = strata)) +
  geom_smooth(data = d1,
              aes(time, surv, color = strata),
              se = FALSE)

2

Roman
  • 4,744
  • 2
  • 16
  • 58