There is a maxdepth
option in ctree. It is located in ctree_control()
You can use it as follows
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
You can also restrict the split sizes and the bucket sizes to be "no less than"
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(minsplit= 50, minbucket = 20))
You can also to reduce increase sensetivity and lower the P-value
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(mincriterion = 0.99))
The weights = 4349
you've mentioned is just the number of observations in that specific node. ctree
has a default of giving a weight of 1 to every observation, but if you feel that you have observations that deserve bigger weights you can add a weights vector to the ctree()
which have to be the same length as the data set and have to be non-negative integers. After you do that, the weights = 4349
will have to be interpreted with caution.
One way of using weights
is to see which observations fell in a certain node. Using the data in the example above we can perform the following
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
unique(where(airct)) #in order the get the terminal nodes
[1] 5 3 6 9 8
so we can check what fell in node number 5 for example
n <- nodes(airct , 5)[[1]]
x <- airq[which(as.logical(n$weights)), ]
x
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
...
Using this method you can create data sets that will contain the informationn of you terminal nodes and then import them into SAS or SQL
You can also get the list of splitting conditions using the function from my answer below
ctree() - How to get the list of splitting conditions for each terminal node?