12

I have some training pipeline that heavily uses XGBoost instead of scikit-learn, only because of the way XGBoost cleanly handles null values.

However, I'm tasked with introducing non-technical folks to machine learning, and thought it'd be good to take the idea of a single-tree classifier and talk about how XGBoost generally takes that data structure and "puts it on steroids." Specifically, I want to plot this single-tree classifier to show cutpoints.

Would specifying n_estimators=1 be roughly equivalent to using scikit's DecisionTreeClassifier?

blacksite
  • 12,086
  • 10
  • 64
  • 109
  • 2
    AFAIK it is, but why not trying and providing an example, like here: [Why is Random Forest with a single tree much better than a Decision Tree classifier?](https://stackoverflow.com/questions/48239242/why-is-random-forest-with-a-single-tree-much-better-than-a-decision-tree-classif) Otherwise, this sounds like a theoretical question, and hence not exactly suitable for SO... – desertnaut Nov 09 '18 at 17:02
  • 1
    Sure, why not :) – blacksite Nov 09 '18 at 17:23
  • Barring any mistakes on my investigative end, they look to be the same. – blacksite Nov 09 '18 at 17:42
  • 2
    Indeed; not quite sure about the exact meaning of `reg_lambda` - maybe this should also be set to 0 (see [this discussion](https://github.com/dmlc/xgboost/issues/2589))? Now, I seriously suggest you take your update and post it as an answer to your initial question... :) – desertnaut Nov 09 '18 at 18:39

3 Answers3

10
import subprocess

import numpy as np
from xgboost import XGBClassifier, plot_tree

from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn import metrics

import matplotlib.pyplot as plt

RANDOM_STATE = 100
params = {
    'max_depth': 5,
    'min_samples_leaf': 5,
    'random_state': RANDOM_STATE
}

X, y = make_classification(
    n_samples=1000000,
    n_features=5,
    random_state=RANDOM_STATE
)

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=RANDOM_STATE)

# __init__(self, max_depth=3, learning_rate=0.1,
# n_estimators=100, silent=True,
# objective='binary:logistic', booster='gbtree',
# n_jobs=1, nthread=None, gamma=0,
# min_child_weight=1, max_delta_step=0,
# subsample=1, colsample_bytree=1, colsample_bylevel=1,
# reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
# base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)
xgb_model = XGBClassifier(
    n_estimators=1,
    max_depth=3,
    min_samples_leaf=5,
    random_state=RANDOM_STATE
)

# __init__(self, criterion='gini',
# splitter='best', max_depth=None,
# min_samples_split=2, min_samples_leaf=1,
# min_weight_fraction_leaf=0.0, max_features=None,
# random_state=None, max_leaf_nodes=None,
# min_impurity_decrease=0.0, min_impurity_split=None,
# class_weight=None, presort=False)
sk_model = DecisionTreeClassifier(
    max_depth=3,
    min_samples_leaf=5,
    random_state=RANDOM_STATE
)

xgb_model.fit(Xtrain, ytrain)
xgb_pred = xgb_model.predict(Xtest)

sk_model.fit(Xtrain, ytrain)
sk_pred = sk_model.predict(Xtest)

print(metrics.classification_report(ytest, xgb_pred))
print(metrics.classification_report(ytest, sk_pred))

plot_tree(xgb_model, rankdir='LR'); plt.show()

export_graphviz(sk_model, 'sk_model.dot'); subprocess.call('dot -Tpng sk_model.dot -o sk_model.png'.split())

Some performance metrics (I know, I didn't calibrate the classifiers totally)...

>>> print(metrics.classification_report(ytest, xgb_pred))
              precision    recall  f1-score   support

           0       0.86      0.82      0.84    125036
           1       0.83      0.87      0.85    124964

   micro avg       0.85      0.85      0.85    250000
   macro avg       0.85      0.85      0.85    250000
weighted avg       0.85      0.85      0.85    250000

>>> print(metrics.classification_report(ytest, sk_pred))
              precision    recall  f1-score   support

           0       0.86      0.82      0.84    125036
           1       0.83      0.87      0.85    124964

   micro avg       0.85      0.85      0.85    250000
   macro avg       0.85      0.85      0.85    250000
weighted avg       0.85      0.85      0.85    250000

And some pictures:

scikit xgboost

So, barring any investigate mistakes/overgeneralizations, an XGBClassifier (and, I would assume, Regressor) with one estimator seems identical to a scikit-learn DecisionTreeClassifier with the same shared parameters.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
blacksite
  • 12,086
  • 10
  • 64
  • 109
  • 1
    I don't think your answer is strictly right. If you do `np.array_equal(xgb_pred, sk_pred)` it will return `True` as a threshold of 0.5 is applied to classify positive and negative classes. However, if you do `np.array_equal(xgb_proba, sk_proba)` where `sk_proba = sk_model.predict_proba(Xtest)` and `xgb_proba = xgb_model.predict_proba(Xtest)` it will return `False`. Hence, your example of `XGBClassifier` is not identical to a `DecisionTreeClassifier` (and their scores have significant differences) with the first row with probs `0.9896` for xgb and `0.643` for DecisionTree for example. – Chris Apr 07 '22 at 16:01
1

If you put n_estimators=1 It'll be exactly how a decision tree work. There are several ways to split nodes (like gini-index and entropy), and I'm not sure which one scikit-learn uses and which one xgboost uses, but it doesn't matter.

You want to show the core features and deep ideas of building a decision tree. I recommend the following Lecture, by Prof. Patrick Winston. I've used it myself to demonstrate how decision trees works to my peers and it went well.

Then, you can add the idea of Boosting into the mix. Patrick also lecture about it in here.

Eran Moshe
  • 3,062
  • 2
  • 22
  • 41
1

Setting XGBoost n_estimators=1 makes the algorithm to generate a single tree (no boosting happening basically), which is similar to the single tree algorithm by sklearn - DecisionTreeClassifier.

But, the hyperparameters that can be tuned and the tree generation process is different in both. Though sklearn DecisionTreeClassifier allows you to tune more hyperparameters than xgboost, the xgboost will yield better accuracy after hyperparameter tuning. A single tree generated by xgboost is better than a single tree generated by the sklearn DecisionTreeClassifier.

Another advantage of xgboost is that it handles missing values on its own. In DecisionTreeClassifier, we have to explicitly define a function to treat missing values which may yield in different results.

So, go for xgboost with n_estimators=1 over sklearn DecisionTreeClassifier!

Kartik
  • 11
  • 1