0

The decision tree we are using in our current project uses Conditional Inference (C Tree) algorithm. I can extract the split variables for binary c-trees using the code below :

    #develop ctree decision tree
    prod_discount_data_ctree <- ctree(Discount~Prod, data=prod_discount_data, controls = ctree_control(minsplit=30))
    plot(prod_discount_data_ctree)

    #extract the left and right terminal node split rule
    lvls <- levels(prod_discount_data_ctree@tree$psplit$splitpoint)

    #left leaf node split variable
    left.df = lvls[prod_discount_data_ctree@tree$psplit$splitpoint == 1]

    #right leaf node split variable
    right.df = lvls[prod_discount_data_ctree@tree$psplit$splitpoint == 0]

This works fine if the tree has only one node (depth = 1) which splits into 2 leaf nodes. But if the tree has one node (node 1) that splits into multiple nodes (node 2,5) which further split into leaf nodes (node 2{3,4} node 5{6,7}), how should I traverse deeper and get the leaf node split variable? Based on the example I would want split variables for node 3,4,6,7 in the form of 4 lists.

sourav de
  • 5
  • 3
  • Which package? partykit, party? – bergant Apr 08 '15 at 21:38
  • I'm using party package. – sourav de Apr 08 '15 at 21:42
  • Maybe there is something in http://stackoverflow.com/questions/17713275/extracting-predictors-from-ctree-object – bergant Apr 08 '15 at 21:54
  • I would recommend to use the new implementation in the `partykit` package. This has a function `nodeapply()` that does the tree traversal for you and you can easily extract the splitting variables and their split points etc. See http://stackoverflow.com/questions/28456814/ for a worked example and the package's vignettes for technical details. – Achim Zeileis Apr 09 '15 at 08:31

1 Answers1

0

I tried all possible options and finally found a way to traverse inside a C-tree, and get the split variables for each leaf node. Pasting the code snippet if anyone wants to refer in future.

    if (nrow(SubBrandright_total) > 200) {

      sec_discount_data <- subset(SubBrandright_total, select=c(Discount,Sector))
      sec_discount_data_ctree <- ctree(Discount~Sector, data=sec_discount_data, controls = ctree_control(minsplit=30))

      sec_lvls_r <- levels(sec_discount_data_ctree@tree$psplit$splitpoint)
      #Testing if the node is terminal [TRUE] or not [FALSE]
      #print(sec_discount_data_ctree@tree$terminal)
      #print(sec_discount_data_ctree@tree$left$terminal)
      #print(sec_discount_data_ctree@tree$left$left$terminal)
      #print(sec_discount_data_ctree@tree$left$right$terminal)

      sec_left_left.df = sec_lvls_r[sec_discount_data_ctree@tree$left$psplit$splitpoint == 1]
      sec_left.df = sec_lvls_r[sec_discount_data_ctree@tree$psplit$splitpoint == 1]

      #Using setdiff to get right leaf node from Node minus left leaf node
      sec_left_right.df = setdiff(sec_left.df,sec_left_left.df)

      print("Sector Segmentation")
      print(sec_left_left.df)
      print(sec_left_right.df)

      sec_right.df = sec_lvls_r[sec_discount_data_ctree@tree$psplit$splitpoint == 0]
      sec_right_right.df = sec_lvls_r[sec_discount_data_ctree@tree$right$psplit$splitpoint == 0]

      #Using setdiff to get left leaf node from Node minus right leaf node
      sec_right_left.df = setdiff(sec_right.df,sec_right_right.df)
      print(sec_right_left.df)
      print(sec_right_right.df)

}
sourav de
  • 5
  • 3