I'm using a decision tree for binary classification, and I'm interested in finding the terminal node with the "purest" classification, corresponding to a subspace of the input space in which a single class dominates. To avoid overfitting, I'm using a min_samples_leaf
flag.
More specifically, I'd like to:
- Go over all the leaves in the trained decision tree.
- Find the 0/1 ratio in each leaf.
- Find the rules corresponding for each leaf.
I've seen previous posts regarding finding the rules, but haven't figured out how to find ratio per leaf.