5

I would like to use mlr to run xgboost on right-censored survival data in R. The xgboost code lists an objective function survival:cox which says:

survival:cox: Cox regression for right censored survival time data (negative values are considered right censored).

Mlr 2 ,which I am using, only supports xgboost for regression and classification learners. If I try to use the built-in regression learner for xgboost, it uses mse as the evaluation metric. So I tried changing the metric to cindex and got the error

Measures: cindex cindex
Error in FUN(X[[i]], ...) : Measure cindex does not support task type regr!

So then I tried to write a new survival learner for xgboost, which is just a copy of the regression learner but with "Regr" changed to "Surv", but of course it expects the target to have 2 columns - time and status - and doesn't accept negative times, whereas xgboost expects only one column - time - and assumes that any rows with a negative value for time are censored.

Below is what I have tried. Is there any way to achieve this in mlr2 or mlr3?

  1. Using built-in regression learner for xgboost:
    data(veteran)
    veteran_xgb <- veteran
    veteran_xgb <- veteran_xgb[c("trt", "karno", "diagtime", "age", "prior", "time")]
    veteran_xgb$time <- ifelse(veteran$status==1, veteran$time, -veteran$time)

    xgb.task <- makeRegrTask(id="XGBOOST_VET", data = veteran_xgb, target="time")
    xgb_learner <- makeLearner(id="xgboost",
                              cl="regr.xgboost",
                              predict.type = "response",
                              par.vals = list(
                                  objective = "survival:cox",
                                  eval_metric = "cox-nloglik",
                                  nrounds = 200
                                )
                              )

    learners = list(xgb_learner)
    outer = makeResampleDesc("CV", iters=5) # Benchmarking
    bmr = benchmark(learners, xgb.task, outer, show.info = TRUE)
  1. Using custom surv learner for xgboost:
    data(veteran)
    veteran_xgb <- veteran
    veteran_xgb <- veteran_xgb[c("trt", "karno", "diagtime", "age", "prior", "time", "status")]
    veteran_xgb$time <- ifelse(veteran$status==1, veteran$time, -veteran$time)

    xgb.task <- makeSurvTask(id="XGBOOST_VET", data = veteran_xgb, target = c("time", "status"))
    xgb_learner <- makeLearner(id="xgboost",
                              cl="surv.xgboost",
                              predict.type = "response",
                              par.vals = list(
                                  objective = "survival:cox",
                                  eval_metric = "cox-nloglik",
                                  nrounds = 200
                                )
                              )

    learners = list(xgb_learner)
    outer = makeResampleDesc("CV", iters=5) # Benchmarking
    surv.measures = list(cindex)
    bmr = benchmark(learners, xgb.task, outer, surv.measures, show.info = TRUE)

The file RLearner_surv_xgboost.R can be downloaded from OneDrive here https://1drv.ms/u/s!AjTjdzp0sDJRrhZtZF5-HZF2BrBB?e=FNLS94

pat-s
  • 5,992
  • 1
  • 32
  • 60
panda
  • 821
  • 1
  • 9
  • 20
  • If I understand correctly you're trying to implement a survival version of xgboost? Have you had a look at some of the other survival learners to see how they're implemented? The [tutorial chapter on how to create your own learner](https://mlr.mlr-org.com/articles/tutorial/create_learner.html) should also be helpful. – Lars Kotthoff Jun 24 '19 at 16:40
  • Yes, @Lars Kotthoff, I am trying to create a survival version of xgboost in mlr. Thanks for your response. I do know how to create my own learner, but the main problem, as I described above, is that an mlr survival learner expects the target to have 2 columns, status & time, whereas xgboost expects the target to have only one column, time, and the status is indicated by whether the time is positive or negative. So I cannot see any way to create a survival version of xgboost in mlr. – panda Jun 25 '19 at 04:41
  • Sounds like you can't do this with xgboost then. Maybe [this issue](https://github.com/dmlc/xgboost/issues/386) helps? – Lars Kotthoff Jun 25 '19 at 15:06

1 Answers1

4

I have found the solution and have updated my custom learner here: https://1drv.ms/u/s!AjTjdzp0sDJRrhewy0yx3Wot3FiI?e=sxRrTN

The trick was to modify the trainlearner.surv.xgboost function. Being a survival learner, it expects to be passed data with a target containing the 2 columns time and status. But within that learner we can calculate the target that xgboost expects, with negative times for censored data, and then pass this new, single-column target to xgboost:

 trainLearner.surv.xgboost = function(.learner, .task, .subset, .weights = NULL,  ...) {
  parlist = list(...)

  if (is.null(parlist$objective))
  {
    parlist$objective = "survival:cox"
    parlist$eval_metric = "cox-nloglik"
  }

  task.data = getTaskData(.task, .subset, target.extra = TRUE)
  survtime <- ifelse(task.data$target$status==1, task.data$target$time, -task.data$target$time)

  parlist$data = xgboost::xgb.DMatrix(data = data.matrix(task.data$data), label = survtime)

  if (!is.null(.weights))
    xgboost::setinfo(parlist$data, "weight", .weights)

  if (is.null(parlist$watchlist))
    parlist$watchlist = list(train = parlist$data)

  do.call(xgboost::xgb.train, parlist)
}

Then to use this new learner:

library(xgboost)
library(survival)
library(mlr)
source("RLearner_surv_xgboost.R")

data(veteran)
veteran.xgb <- veteran[, !(names(veteran) %in% c("celltype"))]

xgb.task <- makeSurvTask(id="XGBOOST_VET", data = veteran.xgb, target = c("time", "status"))
surv.measures = list(cindex)
outer= makeResampleDesc("CV", iters=5)

xgb.learner <- makeLearner(id="xgboost",
                          cl="surv.xgboost",
                          predict.type = "response")
learners = list(xgb.learner)
bmr = benchmark(learners, xgb.task, outer, surv.measures, show.info = TRUE)
panda
  • 821
  • 1
  • 9
  • 20