2

I'm using a decision tree for binary classification, and I'm interested in finding the terminal node with the "purest" classification, corresponding to a subspace of the input space in which a single class dominates. To avoid overfitting, I'm using a min_samples_leaf flag.

More specifically, I'd like to:

  • Go over all the leaves in the trained decision tree.
  • Find the 0/1 ratio in each leaf.
  • Find the rules corresponding for each leaf.

I've seen previous posts regarding finding the rules, but haven't figured out how to find ratio per leaf.

Adam Haber
  • 683
  • 1
  • 6
  • 8
  • What did you do so far? – Mohamed Ali JAMAOUI Sep 13 '17 at 13:57
  • Followed the advice [here](https://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree) in order to print the rules, but couldn't find how to extract the 0/1 ratio – Adam Haber Sep 13 '17 at 17:16
  • Why are you looking for the "purest" leaf? Sometimes that measure can be meaningless, specially if the terminal node was reached too high/low on the tree (i.e. "pureness" can denote an easier part of the problem space OR a part with very few data points OR something else...) – carrdelling Sep 13 '17 at 21:30
  • I'm interested in finding a simple set of rules that will help me find a group of 0s (or 1s) that is as homogeneous as possible. Indeed, I'd have to make sure these rules "make sense", and control for the minimal number of data points per leaf. – Adam Haber Sep 14 '17 at 06:14

0 Answers0