this post follows this question : https://stackoverflow.com/questions/31234329/rpart-user-defined-implementation
I'm very interested in tools which could handle tree growing with customized criteria, such that I could test different model.
I tried to use the partykit R package to grow a tree for which the split rule is given by the negative log-likelihood of a Cox model (which is log-quasi-likelihood in case of the Cox model) and a Cox model is fitted in each leaf.
As I understood reading the vignette about the MOB function, there are two way to implement my own split criteria, namely to get the fit function return either a list or a model object.
For my purpose, I tried the two solutions but I failed to make it work.
Solution 1 : return a list object :
I take as an example the "breast cancer dataset" as in the "mob" vignette.
I tried this :
cox1 = function(y,x, start = NULL, weights = NULL, offset = NULL, ...,
estfun = FALSE, object = TRUE){
res_cox = coxph(formula = y ~ x )
list(
coefficients = res_cox$coefficients,
objfun = - res_cox$loglik[2],
object = res_cox)
}
mob(formula = Surv(time, cens) ~ horTh + pnodes - 1 | age + tsize + tgrade + progrec +
estrec + menostat ,
data = GBSG2 ,
fit = cox1,
control = mob_control(alpha = 0.0001) )
There is a warning about the singularity of the X matrix, and the mob function a tree with a single node (even with smaller values for alpha).
Note that there is no singularity problem with the X matrix when running the coxph function :
res_cox = coxph( formula = Surv(time, cens) ~ horTh + pnodes ,
data = GBSG2 )
Solution 2 : Return a coxph.object :
I tried this :
cox2 = function(y,x, start = NULL, weights = NULL, offset = NULL, ... ){
res_cox = coxph(formula = y ~ x )
}
logLik.cox2 <- function(object, ...)
structure( - object$loglik[2], class = "logLik")
mob(formula = Surv(time, cens) ~ horTh + pnodes - 1 | age + tsize + tgrade + progrec +
estrec + menostat ,
data = GBSG2 ,
fit = cox2,
control = mob_control(alpha = 0.0001 ) )
So this time I get a split along the "progrec" variable :
Model-based recursive partitioning (cox2)
Model formula:
Surv(time, cens) ~ horTh + pnodes - 1 | age + tsize + tgrade +
progrec + estrec + menostat
Fitted party:
[1] root
| [2] progrec <= 21: n = 281
| xhorThno xhorThyes xpnodes
| 0.19306661 NA 0.07832756
| [3] progrec > 21: n = 405
| xhorThno xhorThyes xpnodes
| 0.64810352 NA 0.04482348
Number of inner nodes: 1
Number of terminal nodes: 2
Number of parameters per node: 3
Objective function: 1531.132
Warning message:
In coxph(formula = y ~ x) : X matrix deemed to be singular; variable 2
I would like to know what's wrong with my Solution 1.
I also tried a similar thing for a regression problem and get the same result, ending with a single leaf :
data("BostonHousing", package = "mlbench")
BostonHousing <- transform(BostonHousing,
chas = factor(chas, levels = 0:1, labels = c("no", "yes")),
rad = factor(rad, ordered = TRUE))
linear_reg = function(y,x, start = NULL, weights = NULL, offset = NULL, ...,
estfun = FALSE, object = TRUE){
res_lm = glm(formula = y ~ x , family = "gaussian")
list(
coefficients = res_lm$coefficients,
objfun = res_lm$deviance,
object = res_lm )
}
mob( formula = medv ~ log(lstat) + I(rm^2) | zn + indus + chas + nox +
+ age + dis + rad + tax + crim + b + ptratio,
data = BostonHousing ,
fit = linear_reg)
Also I would like to know if there is no problem using a variable for both "fit the model in a node" and "make a split".
Thank you in advance.
I will probably have other questions about partykit functioning.