4

I have built a decision tree using the ctree function via party package. it has 1700 nodes. Firstly, is there a way in ctree to give the maxdepth argument? I tried control_ctree option but, it threw some error message saying couldnt find ctree function.

Also, how can I consume the output of this tree?. How can it be implemented for other platforms like SAS or SQL. I also have another doubt as to what does the value "* weights = 4349 " at the end of the node signify. How will I know, that which terminal node votes for which predicted value.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Please simplify your question first,by giving an example what fucntion you actually wrote it in your R console. – Aashu Aug 23 '13 at 10:36
  • library(party) train.treeM1<-ctree(U_ACTIVITY_FLAG_STATUS_3~ U_ARPU_M1 + U_RCHRG_CNT_M1 + U_LOCAL_TOT_MOU_M1 + U_OG_CALL_CNT_M1 + U_OG_AVG_CALL_DURATION_M1 + U_IC_CALL_CNT_M1 + U_IC_AVG_CALL_DURATION_M1 + U_DED_RECHARGE_RATIO + U_Advanced_Handset_Ratio + U_Retailer_Baby_Care_Ratio + U_Retailer_Born_Dead_Ratio, data=traindata) table(traindata$U_ACTIVITY_FLAG_STATUS_3,predict(train.treeM1)) #plot(train.treeM1,type="simple") #plot(train.treeM1) #summary(train.treeM1) – Kshitij Kashayp Aug 26 '13 at 12:35
  • This is the code i have used and it has created a tree. Now i want the oputput of this tree to be implemented in SAS/SQL which is in the below mentioned format. – Kshitij Kashayp Aug 26 '13 at 12:37
  • 1) U_OG_CALL_CNT_M1M2 <= 13; criterion = 1, statistic = 53104.0 2) U_DED_RECHARGE_RATIO <= 0; criterion = 1, statistic = 11833.82 3) U_OG_CALL_CNT_M1M2 <= 5; criterion = 1, statistic = 10453.2 4) U_IC_CALL_CNT_M1M2 <= 3; criterion = 1, statistic = 7124.4 5) U_IC_CALL_CNT_M1M2 <= 1; criterion = 1, statistic = 3304.2 6) U_Retailer_Born_Dead_Ratio <= 0.14; criterion = 1, statistic = 2241.2 7) U_OG_CALL_CNT_M1M2 <= 0; criterion = 1, statistic = 665.931 8) U_RCHRG_CNT_M1M2 <= 0; criterion = 1, statistic = 1621.802 9) U_IC_CALL_CNT_M1M2 <= 0; criterion = 1, statistic = 1680.226 10)*weights = 4349 – Kshitij Kashayp Aug 26 '13 at 12:39
  • the output goes on for some 1700 lines.... Can anyone tell me how can I decipher as to how the ending node is voting for which of possible output. – Kshitij Kashayp Aug 26 '13 at 12:40
  • Adding a bit of example data so people can toy with this code might increase your chances of obtaining an answer. So create dummy data arbitrarily. dput can be very useful to turn your data into a textual representation which can be restored easily, making your whole code snipped a SSCCE. Please Edit in your question. – Aashu Aug 26 '13 at 12:55

1 Answers1

4

There is a maxdepth option in ctree. It is located in ctree_control()

You can use it as follows

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))

You can also restrict the split sizes and the bucket sizes to be "no less than"

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(minsplit= 50, minbucket = 20))

You can also to reduce increase sensetivity and lower the P-value

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(mincriterion = 0.99))

The weights = 4349 you've mentioned is just the number of observations in that specific node. ctree has a default of giving a weight of 1 to every observation, but if you feel that you have observations that deserve bigger weights you can add a weights vector to the ctree() which have to be the same length as the data set and have to be non-negative integers. After you do that, the weights = 4349 will have to be interpreted with caution.

One way of using weights is to see which observations fell in a certain node. Using the data in the example above we can perform the following

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
unique(where(airct)) #in order the get the terminal nodes
[1] 5 3 6 9 8

so we can check what fell in node number 5 for example

n <- nodes(airct , 5)[[1]]
x <- airq[which(as.logical(n$weights)), ]  
x
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
...

Using this method you can create data sets that will contain the informationn of you terminal nodes and then import them into SAS or SQL

You can also get the list of splitting conditions using the function from my answer below ctree() - How to get the list of splitting conditions for each terminal node?

Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196