How to get variable distribution at terminal nodes of a classification tree built with CHAID in R?

Question

I built a classification tree using the CHAID algorithm implemented in a package in R. I'm trying to explain a variable which can be equal to 41 different values. At every terminal node, when I plot my tree, the distribution of the values of the variable to explain is printed.

I would like to get informations of the distributions, for example for the terminal node 2, that for 18% of the initial panel, the variable to explain is 3, for 12% it is 4, etc.

Does anyone know how to get that? If the information is drawn, it must exist somewhere, but I couldn't find an easy way to get it.

You should be able to get the rows predicted to be in each terminal node. Then just call `hist()`. — gung - Reinstate Monica, Dec 14 '15 at 17:30
Thank you for your answer. Actually, I'm also having trouble getting which rows belongs to each terminal node. I found a similar issue [here](http://stackoverflow.com/questions/5102754/search-for-corresponding-node-in-a-regression-tree-using-rpart?rq=1) but for rpart, and it doesn't work for Chaid. Maybe someone has an idea ? — Joe Charach, Dec 15 '15 at 10:06
Please add a [reproducible example](http://stackoverflow.com/q/5963269/1217536) for people to work with. — gung - Reinstate Monica, Dec 15 '15 at 12:13
I guess you want `predict(object, type = "prob")`. Or you might want something like `tab <- table(fitted(object)[[1]], fitted(object)[[2]])` and then `prop.table(tab, 1)`. It's hard to say more without a reproducible example. Also, I'm not sure that 41 response categories is the best way to code the response variable here. — Achim Zeileis, Dec 19 '15 at 02:15

How to get variable distribution at terminal nodes of a classification tree built with CHAID in R?

0 Answers0