How to set different kind of bar plot in terminal nodes?

Question

I am running a MOB tree on a dataset and I want to modify plots in terminal nodes. I am going to use bar chart of the coefficients of the models which fitted by MOB in each node as my terminal node.

For example, I run the MOB tree on "PimaIndiansDiabetes" dataset in "mlbench" package. Here is the codes:

pid_formula <- diabetes ~ glucose | pregnant + pressure + triceps +   
insulin + mass + pedigree + age
logit <- function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
glm(y ~ 0 + x, family = binomial, start = start, ...)
}
pid_tree <- mob(pid_formula, data = PimaIndiansDiabetes, fit = logit)

then I have model for each node. for example I have "mass=-9.95+0.058*glucose" for Node number 2. I want to make bar charts from these coefficients (ex: -9.95 and 0.058 for node number 2) and use these bar charts as my terminal nodes in final tree plot. Any idea how to do that? Thanks in advance.

Please make this question reproducible: please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and/or https://stackoverflow.com/help/mcve, then edit your question. Otherwise it is too vague and unfortunately less likely to get a speedy answer. — r2evans, Dec 18 '17 at 07:14

score 1 · Accepted Answer · answered Dec 19 '17 at 23:33

To implement such a graphic in partykit you would have to write a new panel function for the plot() method (or rather a panel-generating function). The starting point could be partykit::node_barplot which first extracts the fitted probabilities of a classification tree and then draws them using the grid package. Instead, you could extract the estimated parameters with coef() and then draw these using grid. It's a bit technical but not extremely complicated.

However, I wouldn't recommend to implement such a function. The reason is that this would be best suited to compare the different coefficients within the same node. But as slope and intercept are on completely different scales this is not easy to interpret. Instead one should give more emphasis to differences in the same coefficient across nodes. The basis for this would also be:

coef(pid_tree)
##   x(Intercept)   xglucose
## 2    -9.951510 0.05870786
## 4    -6.705586 0.04683748
## 5    -2.770954 0.02353582

Additionally one could consider the corresponding standard errors for confidence intervals. (Keep in mind that these have to be taken with a grain of salt, though: They do not adjust for estimating the tree but pretend the terminal groups were given exogenously. Still they are useful as rough yardsticks.) I include a small convenience function to do this:

confintplot <- function(object, ylim = NULL,
  xlab = "Parameter per node", ylab = "Estimate",
  main = "", index = NULL, ...)
{
  ## point estimates and interval
  cf <- coef(object)
  node <- nodeids(object, terminal = TRUE)
  ci <- nodeapply(object, ids = node, FUN = function(n)
                  confint(info_node(n)$object, ...))
  if (!is.null(index)) {
    cf <- cf[, index, drop = FALSE]
    ci <- lapply(ci, "[", index, , drop = FALSE)
  }
  cfnm <- rownames(ci[[1L]])
  nodenm <- rownames(cf)

  ## set up dimensions
  n <- length(ci)
  k <- nrow(ci[[1L]])
  at <- t(outer(1:k, seq(-0.15, 0.15, length.out = n), "+"))

  ## empty plot
  if(is.null(ylim)) ylim <- range(unlist(ci))
  plot(0, 0, type = "n", xlim = range(at), ylim = ylim,
    xlab = xlab, ylab = ylab, main = main, axes = FALSE)

  ## draw every parameter
  for(i in 1L:k) {
    arrows(at[,i], sapply(ci, "[", i, 1L), at[,i], sapply(ci, "[", i, 2L),
      code = 3, angle = 90, length = 0.05)
    points(at[, i], cf[, cfnm[i]], pch = 19, col = "white", cex=1.15)
    points(at[, i], cf[, cfnm[i]], pch = nodenm, cex = 0.65)
  }

  axis(1, at = 1:k, labels = cfnm)
  axis(2)
  box()
}

Using this we can create one plot for each parameter (intercept vs. slope) separately. This shows that the intercept is increasing across nodes while the slope is decreasing.

par(mfrow = c(1, 2))
confintplot(pid_tree, index = 1)
confintplot(pid_tree, index = 2)

It is also possible to show these on a common y-axis. However, this completely obscures the changes in the slope because of the different scales:

confintplot(pid_tree)

Final comment: I would recommend using glmtree() for this particular kind of model instead of mob() "by hand". The former is faster and provides some extra features, especially easy forecasting.

How to set different kind of bar plot in terminal nodes?

1 Answers1