Survival Analysis in R -- why it is always a straight line

Question

I am encountering a problem while trying to generate a survival curve. I don't know and understand why the curve which represents the survival probability of each year is sometimes a straight line. Ideally, it should be an up-and-down line, eventually lying around 55%. I have included codes below, and please kindly refer to them. I have also attached a sample wrong plot to understand my description. Hopefully, it can help you understand what I am describing. Any suggestions will be very appreciated. :)

Plot:

Codes:

library(survival)
library(survminer)
library(dplyr)
library(ggplot2)
library(readxl)
library(tidyverse)

data_all <- data.frame(Years_Diff_Surg_Death = c(8.919917864, 
8.895277207, 8.881587953, 8.821355236, 8.728268309, 8.709103354), Survival = c(1L, 0L, 1L, 1L, 1L, 1L))

data_all <- data.frame(Years_Diff_Surg_Death = c(8.919917864, 
                                                 8.895277207, 8.881587953, 8.821355236, 8.728268309, 8.709103354), Survival = c(1L, 0L, 1L, 1L, 1L, 1L))

data_2013 <- data.frame(Years_Diff_Surg_Death =  c("36.99383984", "2.584531143", "36.91991786", "36.89527721", "36.88158795", "36.82135524"), YEARS_OF_SURGERY = c("2013","2013","2013","2013","2013","2013"), Survival = c("1","0", "1", "1", "1", "1"))

data_2014 <- data.frame(Years_Diff_Surg_Death = c(0.542094456, 5.196440794, 35.95619439, 35.91786448, 35.86584531, 35.8275154), YEARS_OF_SURGERY = c(2014, 2014, 2015, 2014, 2014, 2014, 2016), Survival = c(0, 0, 1, 1, 1, 1))
                        
data_2015 <- data.frame(Years_Diff_Surg_Death = c(34.4476386, 34.25598905,0.621492129, 34.38740589, 34.33264887, 1.081451061), YEARS_OF_SURGERY = c(2015, 2015, 2015, 2015, 2015, 2015), Survival = c(1, 1, 0, 1, 1, 0))
                                                                           
data_2016 <- data.frame(Years_Diff_Surg_Death = c(2.902121834, 0.950034223, 33.9301848, 33.91101985, 33.87268994, 33.85352498), YEARS_OF_SURGERY = c(2016,2016,2016, 2016, 2016, 2016), Survival = c(0, 0, 1, 1, 1, 1))
                                                
data_2017 <- data.frame(Years_Diff_Surg_Death = c(32.99110198, 3.348391513, 32.95277207,32.91170431, 32.87611225, 0.791238877), YEARS_OF_SURGERY = c(2017, 2017, 2017, 2017, 2017, 2017), Survival = c(1, 0, 1, 1, 1, 0)) 

fit_all <- survfit(Surv(Years_Diff_Surg_Death, Survival) ~ 1, data = data_all)

fit_2013 <- survfit(Surv(Years_Diff_Surg_Death, Survival) ~ YEARS_OF_SURGERY, data = data_2013)

fit_2014 <- survfit(Surv(Years_Diff_Surg_Death, Survival) ~ YEARS_OF_SURGERY, data = data_2014)

fit_2015 <- survfit(Surv(Years_Diff_Surg_Death, Survival) ~ YEARS_OF_SURGERY, data = data_2015)

fit_2016 <- survfit(Surv(Years_Diff_Surg_Death, Survival) ~ YEARS_OF_SURGERY, data = data_2016)

fit_2017 <- survfit(Surv(Years_Diff_Surg_Death, Survival) ~ YEARS_OF_SURVERY, data = data_2017)

fit_comb <- list(s_2013 = fit_2013,
                 s_2014 = fit_2014,
                 s_2015 = fit_2015,
                 s_2016 = fit_2016,
                 s_2017 = fit_2017,
                 s_all= fit_all)


ggsurvplot(fit_all, conf.int = TRUE,
          ylim = c(0,1),
          xlim = c(0,5),
          break.x.by = 1,
          title = "Years of Death After Surgery via Survival",
          xlab = ("Years"),
          legend = "none")

ggsurvplot(fit_2013, conf.int = TRUE,
           ylim = c(0,1),
           xlim = c(0,5),
           break.x.by = 1,
           title = ("Years of Death After Surgery via Survival"),
           xlab = ("Years"),
           legend = "none",
           risk.table = F)

ggsurvplot(fit_2014, conf.int = TRUE,
           ylim = c(0,1),
           xlim = c(0,5),
           break.x.by = 1,
           title = ("Years of Death After Surgery via Survival"),
           xlab = ("Years"),
           legend = "none",
           risk.table = F)

ggsurvplot(fit_2015, conf.int = TRUE,
           ylim = c(0,1),
           xlim = c(0,5),
           break.x.by = 1,
           title = ("Years of Death After Surgery via Survival"),
           xlab = ("Years"),
           legend = "none",
           risk.table = F)

ggsurvplot(fit_2016, conf.int = TRUE,
           ylim = c(0,1),
           xlim = c(0,5),
           break.x.by = 1,
           title = ("Years of Death After Surgery via Survival"),
           xlab = ("Years"),
           legend = "none",
           risk.table = F)

ggsurvplot(fit_2017, conf.int = TRUE,
           ylim = c(0,1),
           xlim = c(0,5),
           break.x.by = 1,
           title = ("Years of Death After Surgery via Survival"),
           xlab = ("Years"),
           legend = "none",
           risk.table = F)

ggsurvplot_combine(fit_comb,
                   data_ECV,
                   xlab = ("Years"),
                   xlim = c(0,5),
                   break.x.by = 1)

Please make sure your data is in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). There's a syntax error in `data_2013`. Plus those should probably be data.frames, not vectors. If I copy/paste this code as it is, it will not run — MrFlick, Jul 13 '22 at 19:28
Hi, MrFlick. Thank you very much for leaving your comment here. I have updated my codes, and hopefully they are working now. :) — LIU ZHICHENG, Jul 13 '22 at 19:39
Hi, rawr. Thank you so much for looking at my post and leaving your comment here. As you see, there are some numbers in the Years_Diff_Surg_Death far beyond 5 years. This is because I randomly gave those people a value so that the codes can read and run. I have tried to leave those cells blank but it did not work. So, I randomly gave those values to ensure each cell in my spreadsheet has a value. — LIU ZHICHENG, Jul 13 '22 at 19:40
There are still problems with the data. There's a empty value in the `data_2013` variable and there seems to be an extra "(" in the `data_2015` variable. Please copy/paste into your own R session and double check there are no errors. — MrFlick, Jul 13 '22 at 19:53
Hi, MrFlick. I am sorry for that problem. I have fixed it now. I pasted the dataset into my own R Studio, and there was no error message or warning sign. So, it should be fine to run now. Thanks again for leaving your message here. :) — LIU ZHICHENG, Jul 13 '22 at 20:09
I still get errors with the code. `data_2014` has different numbers of rows. The `fit_2013` line doesn't work because the time variable is not numeric. `fit_2014` doesn't work because `data_2014` doesn't work. `fit_2017` doesn't work because `YEARS_OF_SURVERY` was not found — MrFlick, Jul 13 '22 at 21:14

Survival Analysis in R -- why it is always a straight line

0 Answers0