0

I'm using 'ctree' for classification tree ( categorical response variable; New, Replace). I already got help from other available answers and forced the model to start splitting based on 'Year'. I have four independent variables (including 'Year"). But the model just used one significant variable. So, I want to force the model to split into other nodes based on other parameters, too.

I got help from How to specify split in a decision tree in R programming? @Achim Zeileis

...

decision tree with 'party' package

library(partykit)
set.seed(123)
tr1<- ctree(new_ROWTS ~ Year, data = training )
tr2<- ctree(new_ROWTS ~ Year + STI_OWTS_00+capacity_per_bed+system_type,
    data = training,
    subset = predict (tr1, type = "node")==2)
tr3<- ctree(new_ROWTS ~ Year + STI_OWTS_00+capacity_per_bed+system_type,
     data = training,
     subset = predict (tr1, type = "node")==3)
...........
##Extract the raw node structure from all three trees, fix-up nood id:##
fixids <-  function(x, startid = 1L) {
id <- startid - 1L 
new_node <- function (x) {
id <<- id +1L
if(is.terminal(x)) return(partynode(id, info = info_node(x)))
partynode(id, 
     split = split_node(x),
      kids = lapply(kids_node(x),new_node),
      surrogates = surrogates_node(x), 
      info = info_node(x))
}
return (new_node(x))
}
no <- node_party(tr1)
no$kids <- list (
      fixids(node_party(tr2), startid = 2L),
      fixids(node_party(tr3), startid = 3L)
)
no
............
##set up a joint model:##
d <- model.frame(new_ROWTS ~ Year + 
                 STI_OWTS_00+capacity_per_bed+system_type,
data = training)
tr <- party (no, data = d, 
            fitted = data.frame(
            "(fitted)" = fitted_node(no, data = d),
            "(response)" = model.response(d), check.names = FALSE),
             terms = terms(d),
             ) 
tr <- as.constparty(tr)
##Visualizing##
plot(tr)
##This is the output: Leaf 1 (year) divided to two nodes :before 1998[2],
and >aftre 1998 [3]. and node 3 splits to two [4] and [5]##
[1] root
|   [2] V2 <= 1 *
|   [3] V2 > 1
|   |   [4] V3 <= 10.52754 *
|   |   [5] V3 > 10.52754 *
Z.Lin
  • 28,055
  • 6
  • 54
  • 94
nahalh
  • 1
  • 2

1 Answers1

0

It is unclear what you want. It appears that your predictors do not have enough predictive power to be included in the tree. Forcing splits despite non-significiance of the association with the dependent variable is probably not a very good solution.

If you want to see the structure of the tree when allowing splits at less strict singificance levels (default is alpha = 0.05), you can use something like ctree(..., alpha = 0.8) etc. See ?ctree_control for further details. Whether or not the results of such a tree are useful for interpretation and/or prediction is a different question, though.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • Thanks. I used alpha = 0.4 for the second tree (tr2) and a default alpha for tr3. tr2 showed 2 more splits with the percentage of 'Replace' 'New'. But tr3 didn't show any percentage and showed empty boxes with n= 0! – nahalh May 06 '19 at 15:55