1

I trained a random forest with party::cforest with n_trees for a regression (continuous response). When using "predict(type="response") what one get is only the mean of all n_trees responses. How do I get the response of each individual tree (that is, n_trees responses) ? Thank you very much! I've been trying for weeks and I'm still clueless!

I also tried training the forest with partykit, but still I cannot find a way of getting all responses. In the documentation there is an example with a quantile function. I tried getting the median of all responses (If I can't get all answers explicitly, at least I thought I could get some stats from it), with function(y, w) median(y), but that gives me the same value for all datapoints. So I didn't really understand how the FUN should work in the partykit::predict

I also tried predict(type="prob"), as suggested in other posts for classification randomforests, but with that I got an error "cannot compute empirical distribution function with non-integer weights".

So I remain clueless. Thank you for any help!

  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 04 '23 at 17:49

1 Answers1

0

The ntree individual predictions are actually not computed within cforest(). Instead the predictions of the forest are computed as weighted means of the original responses, where the weights depend on the new data points.

However, you can set up the ntree individual trees and compute the predictions yourself. All the necessary information is in the cforest object.

Let's consider the following simple example for the cars data using a forest with only 10 trees:

library("partykit")
set.seed(1)
cf <- cforest(dist ~ speed, data = cars, ntree = 10)

Then you can obtain the predictions for two new data points:

nd <- data.frame(speed = c(10, 20)) 
predict(cf, newdata = nd)  
##        1        2
## 22.65411 63.11666

Now to replicate this we can also set up the 10 individual trees from the forest. For this we use the constparty class as also returned by ctree():

ct <- lapply(seq_along(cf$nodes), function(i) as.constparty(
  party(cf$nodes[[i]], data = cf$data, terms = cf$terms,
    fitted = data.frame(
      `(response)` = cf$fitted[["(response)"]],
      `(weights)` = cf$weights[[i]],
      check.names = FALSE))
))

To the list of 10 constparty trees you can then apply the predict() method to obtain the 10 individual predictions and compute their mean:

p <- sapply(ct, predict, newdata = nd)
dim(p)
## [1]  2 10
rowMeans(p)
##        1        2 
## 22.65411 63.11666 

But now you can also inspect the full 2 x 10 matrix p with the predictions from all individual trees.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • 1
    Thank you very much! That is exactly what I was searching for. And it is not so straightforward... I wonder why is this not a default in partykit. Thank you! – Guilherme S Mohor Aug 14 '23 at 15:15
  • The approach using neighborhood weights is much more flexible allowing to obtain adaptive local likelihood estimators for more general statistical models. For example you can use the following models in the leaves of the trees: survival, ordinal, distributional, transformation, treatment effects, etc. See the random forest publications on my web page (zeileis.org) for concrete references. – Achim Zeileis Aug 14 '23 at 17:09