Kaplan-Meier survival estimates for multiple variables in R

Question

I am trying to find Kaplan-Meier estimates for multiple variables.

I have a data.set which looks like this:

      fw_year steroid_dos status     current_dos
1   6.3271732       0.0      0        7.5-14.9 mg
24  4.5530457       0.0      0             no-use
29  0.9137577       0.0      0             no-use
33  7.3675566     367.5      0       15.0-24.9 mg
42  3.3127995       0.0      0             no-use
51  9.8288841       0.0      0          >0-4.9 mg
53  8.3696098       0.0      0          >0-4.9 mg

I used the code fit1<-survfit(Surv_df ~ current_dos, data = df1) to obtain the following results per category.

                current_dos=no-use 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   21    480       1    0.998 0.00208        0.994        1.000
  189    447       1    0.996 0.00305        0.990        1.000
  203    444       1    0.993 0.00378        0.986        1.000
  208    443       1    0.991 0.00438        0.983        1.000

my question is how can I do this to obtain kaplan-meier estimates for the drug categories to show results for every 1,5 and 10 years in the year column?

I am trying to show the answers for the years 1,5,10 separately, do you know how i can alter the code to show this? — Beum, Nov 08 '18 at 14:40
Do you want 1,5, and 10 year information in separate dataframes or do you want that information in columns in one dataframe? — Mike, Nov 08 '18 at 14:42

score 1 · Answer 1 · answered Nov 08 '18 at 15:03

1

In the future please post your data in a format that people can copy and paste it into their console. Generally people use dput for this. If you want each column to have a survival estimate by group first get the fit, and put results into data.frame. Then spread results. If you switch time ~strata to strata ~ time then you will have your drug in a column instead of in the column names.

library(survival)
library(data.table)
library(dplyr)

fit1 <- survfit(Surv(time,status)~sex,data = lung)
#get time point estimates
#just example time points for my data
#replace times with times = c(1,5,10)
sum_fit1 <- summary(fit1, times = c(150,365,800))
#put into dataframe and pull out relevant information
fit1_df <- data.frame(sum_fit1[c(2:6,8:11)],stringsAsFactors = FALSE) %>% 
    #change the strata column to make it more readable      
    mutate(strata = ifelse(strata == "sex=1", "Male","Females"))
#transpose data and columns you want in summary table
fit1_df2 <- dcast(
  setDT(fit1_df)
  , time ~ strata
  , value.var = c("n.risk"
                  ,"n.event", "surv","std.err","lower","upper"))

answered Nov 08 '18 at 15:03

Mike

3,797
1
11
30

in this part of the code : mutate(strata = ifelse(strata == "sex=1", "Male","Females")) , should I put in the dosages of my drug? – Beum Nov 08 '18 at 15:26
yes I would put that there just to clean up the strata variable – Mike Nov 08 '18 at 15:28
fit1_df <- data.frame(sum_fit1[c(2:6,8:11)],stringsAsFactors = FALSE) %>% mutate(strata = ifelse(strata == "no-use",">0-4.9 mg","5.0-7.4 mg","7.5-14.9 mg","15.0-24.9 mg",">=25 mg" )) This is what I put into my R studio however it came out with an evaluation error: unused arguments ("7.5-14.9 mg", "15.0-24.9 mg", ">=25 mg"). I don't understand if i have done the code correctly? – Beum Nov 08 '18 at 15:58
you have to nest your `ifelse` statements see here how do to that : https://stackoverflow.com/questions/18012222/nested-ifelse-statement also can you post run this and post the data `dput(data.frame(sum_fit1[c(2:6,8:11)],stringsAsFactors = FALSE) )` – Mike Nov 08 '18 at 16:11

Kaplan-Meier survival estimates for multiple variables in R

1 Answers1