1

I'm trying to do survival analysis using decision trees in rpart, similar to here: Using a survival tree from the 'rpart' package in R to predict new observations . To compare the decision tree survival model to other models, such as Cox regression, I'd like to use cross-validation to get Dxy and compare the c-index. When I try to use validate.rpart with an rpart fit that includes a Surv object I get an error. Borrowing the example from the previous question:

library(rms)

# Make Data:
set.seed(4)
dat = data.frame(X1 = sample(x = c(1,2,3,4,5), size = 100, replace=T))
dat$t = rexp(100, rate=dat$X1)
dat$t = dat$t / max(dat$t)
dat$e = rbinom(n = 100, size = 1, prob = 1-dat$t )

# Survival Fit:
sfit = survfit(Surv(t, event = e) ~ 1, data=dat)
plot(sfit)

# Tree Fit:
require(rpart)
tfit = rpart(formula = Surv(t, event = e) ~ X1 , data = dat, model=TRUE, control=rpart.control(minsplit=30, cp=0.01))
plot(tfit); text(tfit)
validate(tfit)

The error:

Error in unclass(x)[i, , drop = FALSE] : 
(subscript) logical subscript too long

Any idea for a workaround for this problem? Is there any other way to get the c-index from an rpart survival model?

Community
  • 1
  • 1
Rohan Adur
  • 11
  • 2

1 Answers1

2

The R rms package validate.rpart function does not implement survival models (which are in effect simple exponential distribution models) at present. I have improved the code to do this, and this functionality will be in the next release of the rms package to CRAN in a few weeks. New source code can be obtained at https://github.com/harrelfe/rms by tomorrow but that won't help very much because validate.rpart is a method.

Do note that the sample size for recursive partitioning can be excessive, e.g., 100,000 subjects in some cases, for the regression tree to be reliable and stable.

Frank Harrell
  • 1,954
  • 2
  • 18
  • 36
  • Thank you for the code update, it appears to give me a reasonable result now, but what do you mean by "that won't help very much because validate.rpart is a method"? It seemed to help very much indeed! If you were referring to the data in my example, I actually have 250,000 subjects in my production data. – Rohan Adur May 09 '16 at 20:45
  • Sorry I didn't write that clearly. What I meant was that if you just source() in the new version of the function, without rebuilding the package or sourcing in many more code files in the package, the system would ignore `validate.rpart`'s new version. – Frank Harrell May 10 '16 at 02:42
  • If you just add an additional command: `environment(validate.rpart) <- environment(cph)` after `source()`-ing that new code from rms version 4.5-1, then all is well. – IRTFM May 24 '16 at 17:25