2

I have a number of trees, when printing they are 7 pages long. I've had to rebalance the data and need to look at the branches with the highest frequency to see if they make sense - I need to identify a cancellation rate for different clusters.

Given the data is so long what I need is to have the biggest branches and then I can validate those rather than go through 210 branches manually. I will have lots of trees so need to automate this to look at the important results.

Example code to use:

library(CHAID)
updatecars<-mtcars
updatecars$cyl<-as.factor(updatecars$cyl)
updatecars$vs<-as.factor(updatecars$vs)
updatecars$am<-as.factor(updatecars$am)
updatecars$gear<-as.factor(updatecars$gear)
plot(carsChaid)

carsChaid<-chaid(am~  cyl+vs+gear, data=updatecars)
carsChaid

When you print this data, you see n=15 for the first group. I need a table where I can sort on this value.

What I need is a decision tree table with the variable values and the number within each group from the tree. This is not exactly the same as this answer Walk a tree as it doesn't give the number within but I think it's in the direction.

Can someone help,
Thanks,
James

Community
  • 1
  • 1
James Oliver
  • 547
  • 1
  • 4
  • 17

2 Answers2

0

Sure there is a better way to do this but this works.Obviously willing to have corrections and improvements suggested.

The particular trouble i had was creating the list of all combinations. When the expand.grid goes over 3 factors, it stops working. So I had to build a loop ontop of it to create the complete list.

All_canx_rates<-function(Var1,Var2,Var3,Var4,Var5,nametree){
  df1<-data.frame("CanxRate"=0,"Num_Canx"=0,"Num_Cust"=0)
  pars<-as.list(match.call()[-1])
  a<-eval(pars$nametree)[,as.character(pars$Var1)]
  b<-eval(pars$nametree)[,as.character(pars$Var2)]
  c<-eval(pars$nametree)[,as.character(pars$Var3)]
  d<-eval(pars$nametree)[,as.character(pars$Var4)]
  e<-eval(pars$nametree)[,as.character(pars$Var5)]

  allcombos<-expand.grid(levels(a),levels(b),levels(c))
  clean<- allcombos
  allcombos$Var4<-d[1]

  for (i in 2:length(levels(d))) {
    clean$Var4<-levels(d)[i]  
    allcombos<-rbind(allcombos,clean)
  }

  #define a forloop
  for (i in 1:nrow(allcombos)) {
    #define values
    f1<-allcombos[i,1]
    f2<-allcombos[i,2]
    f3<-allcombos[i,3]
    f4<-allcombos[i,4]

  y5<-nrow(nametree[(a %in% f1 & b %in% f2 & c %in% f3 & d %in% f4 & 
                       e =='1'),])
  y4<-nrow(nametree[(a %in% f1 & b %in% f2 & c %in% f3 & d %in% f4),])
  df2<-data.frame("CanxRate"=y5/y4,"Num_Canx"=y5,"Num_Cust"=y4)
  df1<-rbind(df1, df2)

  }
  #endforloop
  #make the dataframe available for global viewing
  df1<-df1[-1,]
  output<<-cbind(allcombos,df1)
  }
James Oliver
  • 547
  • 1
  • 4
  • 17
0

You can use data.tree to do further operations on a party object like sorting, walking the tree, custom plotting, etc. The latest release v0.3.7 from github has a conversion from party class objects:

devtools::install_github("gluc/data.tree@v0.3.7")
library(data.tree)
tree <- as.Node(carsChaid)

tree$fieldsAll

The last command shows the names of the converted fields of the party class:

[1] "data"        "fitted"      "nodeinfo"    "partyinfo"   "split"       "splitlevels" "splitname"   "terms"       "splitLevel" 

You can sort by a function, e.g. the rows of the data on each node:

tree$Sort(attribute = function(node) nrow(node$data), decreasing = TRUE)

print(tree, 
      "splitname",
      count = function(node) nrow(node$data), 
      "splitLevel")

This prints, for instance, like so:

  levelName splitname count splitLevel
1     1          gear    32           
2      ¦--3              17       4, 5
3      °--2              15          3
Christoph Glur
  • 1,224
  • 6
  • 10