1

After fitting a Tree with party::ctree() I want to create a table to characterise the branches.

I have fitted these variables

> summary(juridicos_segmentar)
        actividad_economica
 Financieras      : 89     
 Gubernamental    : 48     
 Sector Primario  : 34     
 Sector Secundario:596     
 Sector Terciario :669     
              ingresos_cut
 (-Inf,1.03e+08]    :931  
 (1.03e+08,4.19e+08]:252  
 (4.19e+08,1.61e+09]:144  
 (1.61e+09, Inf]    :109  

              egresos_cut 
 (-Inf,6e+07]       :922  
 (6e+07,2.67e+08]   :256  
 (2.67e+08,1.03e+09]:132  
 (1.03e+09, Inf]    :126  

             patrimonio_cut
 (-Inf,2.72e+08]    :718   
 (2.72e+08,1.46e+09]:359   
 (1.46e+09,5.83e+09]:191   
 (5.83e+09, Inf]    :168   

   op_ingreso_cut
 (-Inf,3] :1308  
 (3,7]    :  53  
 (7,22]   :  44  
 (22, Inf]:  31

The first one is categorical and the others are ordinal and I fitted them to another factor variable

> summary(as.factor(segmento))
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
 27  66  30  39  36  33  39  15  84  70 271 247 101  34 100  74  47  25  48  50

I used the following code

library(party)
fit_jur <- ctree(cluster ~ ., 
             data=data.frame(juridicos_segmentar, cluster=as.factor(segmento)))

to get this tree

> fit_jur

     Conditional inference tree with 31 terminal nodes

Response:  cluster 
Inputs:  actividad_economica, ingresos_cut, egresos_cut, patrimonio_cut, op_ingreso_cut 
Number of observations:  1436 

1) actividad_economica == {Financieras}; criterion = 1, statistic = 4588.487
  2) ingresos_cut <= (4.19e+08,1.61e+09]; criterion = 1, statistic = 62.896
    3) egresos_cut <= (6e+07,2.67e+08]; criterion = 1, statistic = 22.314
      4)*  weights = 70 
    3) egresos_cut > (6e+07,2.67e+08]
      5)*  weights = 10 
  2) ingresos_cut > (4.19e+08,1.61e+09]
    6)*  weights = 9 

plot of part of the tree

What I want is a table where every row is a path from the node to a leaf saying the prediction of the variable segmento and every column is the condition on the variable to split. Something alike this:

actividad economica      ingresos (rango)   egresos (rango) patrimonio (rango) operaciones de ingreso   segmento
Sector Primario                             <=261.000.000                                                 18
Sector Primario                             >261.000.000                                                  20

The problem is there are several leaves to characterise and some time a variable appears several times in one path so I'd like to intersect the conditions, i.e. intersecting the ranges.

I've thought of data.tree::ToDataFrameTable but I've got no idea of how it works with party.

Thank you very much guys!


library(partykit)
fit_jur <- ctree(cluster ~ ., 
             data=data.frame(juridicos_segmentar, cluster=as.factor(segmento)))

pathpred <- function(object, ...)
{
  ## coerce to "party" object if necessary
  if(!inherits(object, "party")) object <- as.party(object)

  ## get standard predictions (response/prob) and collect in data frame
  rval <- data.frame(response = predict(object, type = "response", ...))
  rval$prob <- predict(object, type = "prob", ...)

  ## get rules for each node
  rls <- partykit:::.list.rules.party(object)

  ## get predicted node and select corresponding rule
  rval$rule <- rls[as.character(predict(object, type = "node", ...))]

  return(rval)
}

ct_pred_jur <- unique(pathpred(fit_jur)[c(1,3)])

write.csv2(ct_pred_jur,'parametrizacion_juridicos.csv')

thank you Achim Zeileis for pointing me in this direction, I couldn't intersect the rules in a same variable, i.e. evaluate the '&s'. That problem is still open.

Samuel Liew
  • 76,741
  • 107
  • 159
  • 260

1 Answers1

2

You can convert both party class (from partykit) and BinaryTree (from party) to a data.tree, and use it for conversion to data frame and/or printing. For example like this:

library(party)
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
               controls = ctree_control(maxsurrogate = 3))
tree <- as.Node(airct)
df <- ToDataFrameTable(tree,
      "pathString",
      "label",
      criterion = function(x) round(x$criterion$maxcriterion, 3),
      statistic = function(x) round(max(x$criterion$statistic), 3)
)
df

This will print like so:

  pathString        label criterion statistic
1      1/2/3 weights = 10     0.000     0.000
2    1/2/4/5 weights = 48     0.936     6.141
3    1/2/4/6 weights = 21     0.891     5.182
4      1/7/8 weights = 30     0.675     3.159
5      1/7/9  weights = 7     0.000     0.000

Plotting:

#print subtree
subtree <- Clone(tree$`2`)
SetNodeStyle(subtree, 
             style = "filled,rounded", 
             shape = "box", 
             fillcolor = "GreenYellow", 
             fontname = "helvetica", 
             label = function(x) x$label,
             tooltip = function(x) round(x$criterion$maxcriterion, 3))
plot(subtree)

And the result will look like this:

enter image description here

Christoph Glur
  • 1,224
  • 6
  • 10