0

I'm trying to write a function that takes a few lines of code and allows me to input a single variable. I've got the code below that creates an object using the Surv function (Survival package). The second line takes the variable in question, in this case a column listed as Variable_X, and outputs data that can then be visualized using ggsurvplot. The output is a Kaplan-Meier survival curve. What I'd like to do is have a function such that i can type f(Variable_X) and have the output KM curve visualized for whichever column I choose from the data. I want f(y) to output the KM as if I had put y where the ~Variable_X currently is. I'm new to R and very new to how functions work, I've tried the below code but it obviously doesn't work. I'm working through datacamp and reading posts but I'm having a hard time with it, appreciate any help.

surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = KMeier_DF$Death_Indicator)

fitX <- survfit(surv_object ~ Variable_X, data = KMeier_DF)

ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)

 f<- function(x) {
 dat<-read.csv("T:/datafile.csv")
 KMeier_DF < - dat
 surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = 
 KMeier_DF$Death_Indicator)
 fitX<-survfit(surv_object ~ x, data = KMeier_DF)
 PlotX<- ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
 return(PlotX)
}
Mike C
  • 41
  • 6

2 Answers2

1

The crux of the problem you have is actually a tough stumbling block to figure out initially: how to pass variable or dataframe column names into a function. I created some example data. In the example below I supply a function four variables, one of which is your data. You can see two ways I call on the columns, using [[]], and [,], which you can think of as being equivalent to using $. Outside of functions, they are, but not inside. The print functions are there to just show you the data along the way. If those objects exist in your global environment, remove them one by one, rm(surv_object), or clear them all rm(list = ls()).

duration <- c(1, 3, 4, 3, 3, 4, 2)
di <- c(1, 1, 0, 0, 0, 0, 1)
color <- c(1, 1, 2, 2, 3, 3, 4)
KMdf <- data.frame(duration, di, color)

testfun <- function(df, varb1, varb2, varb3) {
  surv_object <- Surv(time = df[[varb1]], event = df[ , varb2])
  print(surv_object)
  fitX <- survfit(surv_object ~ df[[varb3]], data = df)
  print(fitX)
#  plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
#  return(plotx)
}

testfun(KMdf, "duration", "di", "color") # notice the use of quotes here, if not you'll get an error about object not found.

And even better, you have an even tougher stumbling block: how r handles variables and where it looks for them. From what I can tell, you're running into that because there is possibly a bug in ggsurvplot and looking at the global environment for variables, and not inside the function. They closed the issue, but as far as I can tell, it's still there. When you try to run the ggsurvplot line, you'll get an error that you would get if you didn't supply a variable:

Error in eval(inp, data, env) : object 'surv_object' not found.

Hopefully that helps. I'd submit a bug report if I were you.

edit

I was hoping this solution would help, but it doesn't.

testfun <- function(df, varb1, varb2, varb3) {
  surv_object <- Surv(time = df[[varb1]], event = df[,varb2])
  print(surv_object)
  fitX <- survfit(surv_object ~ df[[varb3]], data = df)
  print(fitX)
  attr(fitX[['strata']], "names") <- c("color = 1", "color = 2", "color = 3", "color = 4")
  plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
  return(plotx)
}

Error in eval(inp, data, env) : object 'surv_object' not found
Anonymous coward
  • 2,061
  • 1
  • 16
  • 29
0

This is homework, right?

First, you need to try to run the code before you provide it as an example. Your example has several fatal errors. ggsurvplot() needs either a library call to survminer or to be summoned as follows: survminer::ggsurvplot().

You have defined a function f, but you never used it. In the function definition, you have a wayward space < -. It never would have worked.

I suggest you start by defining a function that calculates the sum of two numbers, or concatenates two strings. Start here or here. Then, you can return to the Kaplan-Meier stuff.

Second, in another class or two, you will need to know the three parts of a function. You will need to understand the scope of a function. You might as well dig into the basics before you start copy-and-pasting.

Third, before you post another question, please read How to make a great R reproducible example?.

Best of luck.

Robert Hadow
  • 457
  • 4
  • 15
  • You had me until you assumed I was a student looking for homework help. I'm a physician that is trying to learn R in my free time so I can better interact with my bioinformatic colleagues. I'm working through some texts, but also enjoy interactive setting such as this. I've seen and tried functions with simple numeric operations, and find them more straight forward than having the function call out to a column of a dataframe. I will check out your links and I suppose I'll struggle along on my own until my own mastery is to the point where I can figure it out. I'll be more clear in the future. – Mike C Jun 29 '18 at 12:31
  • 1
    Robert Hadow, you can answer a question without resorting to belittling people. [Be nice](https://stackoverflow.com/help/be-nice). @Mike C, providing a small data set to reproduce the error would be helpful in the future. – Anonymous coward Jun 29 '18 at 15:39