Feature importance calculation for gradient boosted regression tree versus random forest

Question

On data with a few features I train a random forest for regression purposes and also gradient boosted regression trees. For both I calculate the feature importance, I see that these are rather different, although they achieve similar scores.

For the random forest regression:

MAE: 59.11
RMSE: 89.11

Importance:

Feature 1: 64.87
Feature 2: 0.10
Feature 3: 29.03
Feature 4: 0.09
Feature 5: 5.89

For the gradient boosted regression trees:

MAE: 58.70
RMSE: 90.59

Feature 1: 65.18
Feature 2: 5.67
Feature 3: 13.61
Feature 4: 4.26
Feature 5: 11.27

Why is this ? I thought maybe because with gradient boosted regression trees, the trees are more shallow than with random forests. But I am not sure.

There is no reason they should be the same as each algorithm calculates them differently. Feature importance is not a well defined property like, lets say prediction accuracy. — elyase, Jan 10 '15 at 00:15
@elyase I think the method for calculating the feature importance is independent from the type of ensemble based tree algorithm. Probably as described here: http://stackoverflow.com/questions/15810339/how-are-feature-importances-in-randomforestclassifier-determined — Olivier_s_j, Jan 10 '15 at 15:08

score 2 · Answer 1 · answered Jan 10 '15 at 16:34

Though they are both tree based they are still different algorithms, so each calculates the feature importances differently, here is the relevant code:

scikit-learn/sklearn/ensemble/gradient_boosting.py

def feature_importances_(self):   
    total_sum = np.zeros((self.n_features, ), dtype=np.float64)
    for stage in self.estimators_:
        stage_sum = sum(tree.feature_importances_
                        for tree in stage) / len(stage)
        total_sum += stage_sum

    importances = total_sum / len(self.estimators_)
    return importances

scikit-learn/sklearn/ensemble/forest.py

def feature_importances_(self):
    all_importances = Parallel(n_jobs=self.n_jobs, backend="threading")(
        delayed(getattr)(tree, 'feature_importances_')
        for tree in self.estimators_)
    return sum(all_importances) / self.n_estimators

So, different trees and different ways to combine the trees.

Even if codes are not the same, they both do the exact same thing: averaging `tree.feature_importances_` for all tree `tree` in the forest or boosting ensemble. The differences between the scores only stem from the fact that trees are built using different algorithms. — Gilles Louppe, Feb 21 '15 at 18:06

Feature importance calculation for gradient boosted regression tree versus random forest

1 Answers1