I'm looking for a random forest package in python or R that will let me get some fine-grained details on the final forest that was built. In particular, I would like to:
- Get some representation of the trees created;
- For each tree in the forest, get an overall measure of how well it fits the data (like entropy);
- For each record in the training set and each tree in the forest, a record of which terminal leaf it ended up in; and
- For a new record, and for each tree in the forest, a record of which terminal leaf it ends up in.
I realize that a solution to (4) will also work as a solution to (3), but I'm guessing that (3) should be relatively easy to do by keeping tabs on the results as the forest is grown.
I've looked pretty hard at the available options in both R and python, and I can't find an off-the-shelf routine that supplies all four of these requirements. (It's hard enough to find one that satisfies condition (1) of actually letting you see the forest directly.)
If anyone knows of something I've missed, or has constructed such a routine themselves, I'd very much like a link/reference to it.