What impurity index (Gini, entropy?) is used in TensorFlow Random Forests with CART trees?

Question

I was looking for this information in the tensorflow_decision_forests docs (https://github.com/tensorflow/decision-forests) (https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/wrappers/CartModel) and yggdrasil_decision_forests docs (https://github.com/google/yggdrasil-decision-forests).

I've also taken a look at the code of these two libraries, but I didn't find that information. I'm also curious if I can specify an impurity index to use.

I'm looking for some analogy to sklearn decision tree, where you can specify the impurity index with criterion parameter. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

For TensorFlow Random Forest i found only a parameter uplift_split_score:

uplift_split_score: For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: p probability / average value of the positive outcome, q probability / average value in the control group. - KULLBACK_LEIBLER or KL: - p log (p/q) - EUCLIDEAN_DISTANCE or ED: (p-q)^2 - CHI_SQUARED or CS: (p-q)^2/q Default: "KULLBACK_LEIBLER".

I'm not sure if it's a good lead.

score 1 · Answer 1 · answered Jun 14 '22 at 08:20

1

No, you shouldn't use uplift_split_score, because it is For uplift models only. Uplift modeling is used to estimate treatment effect or other tasks in causal inference

answered Jun 14 '22 at 08:20

hello00

26
3

What impurity index (Gini, entropy?) is used in TensorFlow Random Forests with CART trees?

1 Answers1