Accessing gradient boosting tree weights in fitted model

Question

Gradient Boosting learns a function that looks something like this:

F(X) = W1*T1(X) + W2*T2(X) + ... + Wi*Ti(X)

where Wi are weights and Ti are weak learners (decision trees). I know how to extract the individual Ti (estimators_ property) from a fitted gradient boosting model in scikit-learn, but is there a way to extract the Wi?

score 2 · Accepted Answer · answered Oct 14 '14 at 15:22

2

well... the Wi's consist of the line-search estimate times the learning rate. In sklearn the learning rate is constant so its pulled out. In gradient-boosting there is actually one weight assigned to each terminal region (aka leaf). Those estimates are stored directly in the trees and updated during the fitting of the gradient boosting model (see [1]).

To access the estimates for terminal regions of the first tree do::

tree = gbrt.estimators_[0, 0].tree_
leaf_mask = tree.children_left == TREE_LEAF  # TREE_LEAF == -1
w_i = tree.value[leaf_mask, 0, 0]

[1] https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L197

answered Oct 14 '14 at 15:22

Peter Prettenhofer

1,951
18
23

1

If I'm understanding you correctly, since the weights are incorporated into the trees, the fitted GB model in sklearn is just the straight average of the individual trees? Because I thought I tried that and got a different answer than gbrt.predict. – rytido Oct 15 '14 at 00:22
1

no -- its a sum over the corresponding leaf estimates of each tree weighted by the learning rate – Peter Prettenhofer Oct 16 '14 at 09:05
1

and dont forget the initial model (mean for squared loss; log-odds for log loss) – Peter Prettenhofer Oct 16 '14 at 09:06
@PeterPrettenhofer Could you please explain why TREE_LEAF == -1 for terminal regions? – Dudelstein Mar 14 '23 at 14:08

Accessing gradient boosting tree weights in fitted model

1 Answers1

Linked