1

For reproducing the error and what I intend to get. I have the following example. Lets say I have a datset : Iris. I am modeling a classification tree using

library(party)
ct <- ctree(Species ~ ., data = iris) #here Species is the categorical response variable
print(ct)
plot(ct)

enter image description here

But my question is how do i get the splitting conditions on every node , as well the terminal nodes values?

I found out using

library(partykit)
partykit:::.ctree_fit(ct)

its easy to find the split conditions. But this takes more than 8 hours for my dataset with maxdepth of tree defined to be 3.

to summarize :

  1. I need to find the splitting conditions for a tree using library(party)
  2. Also find the values in the terminal nodes of the tree. So i can use these values in defining the rules

*Note: My dataset is little complex than the Iris data set. I have the following structure

ct <- ctree(Y ~ V1 + V2 + V3 ,  data= MyData,controls = ctree_control(maxdepth = 3))# here Y is factor variable , V1 & V3 is continous , V2 is categorical variable

when i do : where(ct) i get below error *Error :

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘where’ for signature ‘"constparty"’

Please help me in going further about this problem

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
user3560220
  • 221
  • 3
  • 4
  • 11
  • Thanks guys. I should been little careful before posting the question. My bad. But there was two parts to my question. Splitting conditions is found from the above link. How can i find the classification probability values? @42 – user3560220 Jan 25 '16 at 08:12
  • Can you show your desired output too please? `where` should work on `party::ctree`, not sure regarding `partykit` – David Arenburg Jan 25 '16 at 08:19
  • @DavidArenburg Thanks for your quick response. I would like to get something similar to what we have with `partykit` function has. example : `(pred <- aggregate(predict(ct, type = "prob"), list(predict(ct, type = "node")), FUN = mean)) ## Group.1 setosa versicolor virginica ## 1 2 1 0.00000000 0.00000000 ## 2 5 0 0.97826087 0.02173913 ## 3 6 0 0.50000000 0.50000000 ## 4 7 0 0.02173913 0.97826087` reference [link](http://stackoverflow.com/questions/30644908/modifying-terminal-node-in-ctree-partykit-package) – user3560220 Jan 25 '16 at 08:46
  • Does [this](http://stats.stackexchange.com/questions/171301/interpreting-ctree-partykit-output-in-r/171317#171317) help? – David Arenburg Jan 25 '16 at 09:07
  • @DavidArenburg Thanks again! . I looked into this already and modified to rectify my problem, but in vain. What i am looking for is a simple classification probability on the terminal nodes. Ex: Node 6: class1 - 60%,class2 - 40% and for all terminal nodes. Based on this condition , I can only traverse these leaf nodes to find the rules. That is my Idea. Any clue or methods in doing this ? My response variable is Categorical variable , with 2 classes – user3560220 Jan 25 '16 at 09:25
  • As @David Arenburg already pointed out: The `where()` function expects an S4 object as produced by `party::ctree`. The new `partykit::ctree` implementation uses S3 instead and simply uses `predict(..., type = "node")` rather than introducing a new generic function. – Achim Zeileis Jan 25 '16 at 11:06
  • Further links that might be useful for processing the structure of `partykit` trees: http://stackoverflow.com/questions/29999626/how-to-extract-the-splitting-rules-for-the-terminal-nodes-of-ctree/30000007#30000007 ; http://stackoverflow.com/questions/21443203/ctree-how-to-get-the-list-of-splitting-conditions-for-each-terminal-node/29999993#29999993 ; http://stackoverflow.com/questions/29618490/get-decision-tree-rule-path-pattern-for-every-row-of-predicted-dataset-for-rpart/29638602#29638602 – Achim Zeileis Jan 25 '16 at 11:07
  • @AchimZeileis Thanks for your guidance on this. I went through the links you posted. It is useful when I am using a `library(partykit)`. But this is very slow for my data and computes for more than 8 hours on windows machine. Hence i chose `party` over `partykit`. So started all the problems. Any way in doing it using `party`?. `Where()` actually doesn't give the probability of the terminal node classes – user3560220 Jan 25 '16 at 11:28
  • You mentioned that you used `partykit:::.ctree_fit(ct)` which takes 8 hours. I don't understand what you are trying to do with this. In any case, this is not the correct way to fit a `ctree()`. – Achim Zeileis Jan 25 '16 at 11:34
  • @AchimZeileis Since you asked me this , Let me post exactly what i am doing. I am trying to fit a ctree for the data of type: Str(testData). data.frame': 410979 obs. of 4 variables: $ ID : Factor w/ 503 levels "6862","10022", $ Km : int 52783 50339 $ MIS : num 23 19 36 33 4 $ Class: Factor w/ 2 levels "0","1" I am basically trying to classify the data into class 1 or 0 . My formula of `ctree`: `ct <- ctree(Class ~ Km + MIS + ID , data= MyData,controls = ctree_control(maxdepth = 3))# here Class is factor variable , KM & MIS is continous , ID is categorical variable`. Makes sense? – user3560220 Jan 25 '16 at 12:00
  • The ID is surely the problem. Finding the best binary split among 503 levels is computationally very intensive (but this should be the same in `party` and `partykit`). I'm also not sure whether the tree will make sense or be useful for predictions if you have 503 different levels. If these are really individuals and you have repeated measurements then this should probably be handled differently. In any case, this has nothing to do with the original question you asked here and is beyond the scope of this comments section. – Achim Zeileis Jan 25 '16 at 12:05
  • @AchimZeileis thank you for your prompt answer. I will try to find a different method in solving the issue – user3560220 Jan 25 '16 at 12:14

0 Answers0