5

I have a data set with 6 categorical variables with levels ranging from 5 to 28. I have obtained an output from ctree() (party package) with 17 terminal nodes. I have followed the inputs by @Galled from ctree() - How to get the list of splitting conditions for each terminal node? to arrive at my desired output.

But, I'm getting the following error post running the code:

Error in data.frame(ResulTable, Means, Counts) : 
  arguments imply differing number of rows: 17, 2

I have tried adding this extra lines:

ResulTable <- rbind(ResulTable, cbind(Node = Node, Path = Path2))

ResulTable$Node <- rownames(ResulTable)

melt(ResulTable)

but no success so far. Any pointers on where it is going wrong?

Community
  • 1
  • 1
Debbie
  • 391
  • 2
  • 18

1 Answers1

9

I would recommend to use the new partykit implementation of ctree() rather than the old party package, then you can use the function .list.rules.party(). This is not officially exported, yet, but can be leveraged to extract the desired information.

library("partykit")
airq <- subset(airquality, !is.na(Ozone))
ct <- ctree(Ozone ~ ., data = airq)
partykit:::.list.rules.party(ct)
##                                      3                                      5 
##             "Temp <= 82 & Wind <= 6.9" "Temp <= 82 & Wind > 6.9 & Temp <= 77" 
##                                      6                                      8 
##  "Temp <= 82 & Wind > 6.9 & Temp > 77"             "Temp > 82 & Wind <= 10.3" 
##                                      9 
##              "Temp > 82 & Wind > 10.3" 
Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • Thank you for your prompt reply. With the above code, I'm getting this error: Error in UseMethod("nodeids") : no applicable method for 'nodeids' applied to an object of class "c('BinaryTree', 'BinaryTreePartition')" – Debbie May 02 '15 at 08:36
  • 1
    Then you have fitted your tree with `party::ctree` not with `partykit::ctree`. Make sure that you do not load both packages simultaneously. This wis bound to lead to confusion... – Achim Zeileis May 02 '15 at 08:42
  • 1
    Running ctree with partykit package (with the default control parameters) is taking an indefinite time as compared to running ctree with party package which was much faster. I have a dataset with 100K rows and 6 columns. I'm running R version 3.1.3 on a 32-bit 64 GB machine. Any inputs on this? – Debbie May 04 '15 at 05:21
  • 2
    The old `party` implementation could run into numerical problems when comparing p-values from datasets with hundreds of thousands of observations. The new `partykit` implementation uses log-p-values instead which is numerically more stable. For your data this appears to lead to differences in the splitting with `partykit` continuing longer. I would recommend to not use the default values only but restrict `mincriterion`, `minbucket`, or `maxdepth` to values that are better suited for your data. – Achim Zeileis May 04 '15 at 17:03