1

I built a classification tree using the CHAID algorithm implemented in a package in R. I'm trying to explain a variable which can be equal to 41 different values. At every terminal node, when I plot my tree, the distribution of the values of the variable to explain is printed.

enter image description here

I would like to get informations of the distributions, for example for the terminal node 2, that for 18% of the initial panel, the variable to explain is 3, for 12% it is 4, etc.

Does anyone know how to get that? If the information is drawn, it must exist somewhere, but I couldn't find an easy way to get it.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • You should be able to get the rows predicted to be in each terminal node. Then just call `hist()`. – gung - Reinstate Monica Dec 14 '15 at 17:30
  • Thank you for your answer. Actually, I'm also having trouble getting which rows belongs to each terminal node. I found a similar issue [here](http://stackoverflow.com/questions/5102754/search-for-corresponding-node-in-a-regression-tree-using-rpart?rq=1) but for rpart, and it doesn't work for Chaid. Maybe someone has an idea ? – Joe Charach Dec 15 '15 at 10:06
  • Please add a [reproducible example](http://stackoverflow.com/q/5963269/1217536) for people to work with. – gung - Reinstate Monica Dec 15 '15 at 12:13
  • 1
    I guess you want `predict(object, type = "prob")`. Or you might want something like `tab <- table(fitted(object)[[1]], fitted(object)[[2]])` and then `prop.table(tab, 1)`. It's hard to say more without a reproducible example. Also, I'm not sure that 41 response categories is the best way to code the response variable here. – Achim Zeileis Dec 19 '15 at 02:15

0 Answers0