0

I'm looking for a random forest package in python or R that will let me get some fine-grained details on the final forest that was built. In particular, I would like to:

  1. Get some representation of the trees created;
  2. For each tree in the forest, get an overall measure of how well it fits the data (like entropy);
  3. For each record in the training set and each tree in the forest, a record of which terminal leaf it ended up in; and
  4. For a new record, and for each tree in the forest, a record of which terminal leaf it ends up in.

I realize that a solution to (4) will also work as a solution to (3), but I'm guessing that (3) should be relatively easy to do by keeping tabs on the results as the forest is grown.

I've looked pretty hard at the available options in both R and python, and I can't find an off-the-shelf routine that supplies all four of these requirements. (It's hard enough to find one that satisfies condition (1) of actually letting you see the forest directly.)

If anyone knows of something I've missed, or has constructed such a routine themselves, I'd very much like a link/reference to it.

David Pepper
  • 593
  • 1
  • 4
  • 14

2 Answers2

0

Try This:

Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described in http://blog.datadive.net/interpreting-random-forests/. For a dataset with n features, each prediction on the dataset is decomposed as prediction = bias + feature_1_contribution + ... + feature_n_contribution.

pip install treeinterpreter

http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/

Merlin
  • 24,552
  • 41
  • 131
  • 206
  • Hmmm -- never saw this one before. Thanks for the hint; I will check it out. – David Pepper Jun 08 '16 at 00:37
  • If something is missing there, can add the analysis yourself by looking at the trees generated. See https://stackoverflow.com/questions/50600290/how-extraction-decision-rules-of-random-forest-in-python – Jon Nordby Jun 23 '18 at 20:41
0

Another solution is lime,it will explain the weights of features for a prediction, and has a Visualizing Explanation using matplotlib which integration with jupyter(ipython) notebook easily.