6

What does the number on top of a node in a fancyRpartPlot decision tree mean? I've highlighted them in the picture below.

Example fancyRPartPlot

My guess is that they are the order/rank of the nodes, but I can't explain the jumps (in th example, 9-11 are missing) in the numbers.

marqram
  • 725
  • 12
  • 26
  • Well, it just the number of the node. Use `print` on your tree object and then the ordering of the numbers will make much more sense. – zielinskipp Aug 08 '17 at 14:05

3 Answers3

7

The numbers at the top of each node in the tree correspond to the branch numbers in the textual representation of the trees as generated by the default print() method. To confirm:

> dt <- rpart::rpart(Species ~ ., iris)
> print(dt)
n= 150 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 150 100 setosa (0.33 0.33 0.33)  
  2) Petal.Length< 2.45 50   0 setosa (1.00 0.00 0.00) *
  3) Petal.Length>=2.45 100  50 versicolor (0.00 0.50 0.50)  
    6) Petal.Width< 1.75 54   5 versicolor (0.00 0.91 0.093) *
    7) Petal.Width>=1.75 46   1 virginica (0.00 0.022 0.98) *
> rattle::fancyRpartPlot(dt)

enter image description here

The "jumps" result from rpart() tuning the tree to remove some of the branches and those pruned branches do not appear in the final tree.

Graham Williams
  • 556
  • 2
  • 10
0

fancyRpartPlot is just a wrapper for prp. Looking at the source code of prp it looks like these are the node numbers, created by:

  if(nn || ni)
        draw.node.numbers(nn, ni, draw.shadows1, type, branch,
                Margin, xflip, yflip, cex,
                main, sub, col.main, cex.main, col.sub, cex.sub,
                xlim, ylim, node.xy, is.leaf, nodes,
                node.labs, font,  family, box.col, border.col, shadow.col,
                under.cex, under.font, under.ygap, ygap,
                split.labs, split.cex * cex, split.font, split.family, split.box.col,
                split.border.col, split.shadow.col,
                nn.cex, nn.font, nn.family, nn.col, nn.box.col,
                nn.border.col, nn.lty, nn.lwd, nn.round,
                split.adj, split.space, split.yspace, split.yshift,
                yshift, adj, space, yspace, shadow.offset,
                nn.adj, nn.yshift, nn.space, nn.yspace, bg)

list(node.boxes=node.boxes, split.boxes=split.boxes)

https://github.com/cran/rpart.plot/blob/master/R/prp.R

You can also find some comments about this in the code

Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Thank you! I look throught he print(tree) result as well and found that I could see the number nodes there as well. It did not become clear to me though, why there are gaps in the numbering of the nodes. Are those nodes that disappeared in a pruning stage? – marqram Aug 08 '17 at 14:40
  • Could be. I'm in a meeting right now but I will follow up after I get off work. – Hack-R Aug 08 '17 at 15:27
0

the top number of each node represent majority category/class id. In your case at start node 0.4 (or 40%) is category value "2".