5

I have a question to ask:

How exactly are different L1 and L2 regularization terms on weights in xgboost algorithm.

As I understand, L1 is used by LASSO and L2 is used by RIDGE regression and L1 can shrink to 0, L2 can't. I understand the mechanics when using simple linear regression, but I have no clue how it works in tree based models.

Further more, gamma is another parameter, that makes the model more conservative. How should I notice the difference between L1/L2 and gamma parameter.

I have found in documentation very little to this problem:

lambda [default=1, alias: reg_lambda]

  • L2 regularization term on weights. Increasing this value will make model more conservative.

alpha [default=0, alias: reg_alpha]

  • L1 regularization term on weights. Increasing this value will make model more conservative.

gamma [default=0, alias: min_split_loss]

  • Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be.

All of them ranging from 0 to inf.

Thanks in advance for any answer/comment!

Vojtech Stas
  • 631
  • 8
  • 22

2 Answers2

3

I have foud this blog post on gamma regularization. It's the best I have fond in 2 months. Still searching for answers, but lack of information is huge.

Vojtech Stas
  • 631
  • 8
  • 22
0

I am wondering about this as well. The best thing I have found is the original XGBoost paper. Section 2.1 makes it sound as if XGBoost uses regression tree as a main building block for both regression and classification. If this is correct, then Alpha and Lambda probably work in the same way as they do in the linear regression.

Gamma controls how deep trees will be. Large gamma means large hurdle to add another tree level. So larger gamma regularizes the model by growing shallower trees. E. g., depth-2 tree has smaller range of predicted values than depth-10 tree, so such model will have lower variance.

Moysey Abramowitz
  • 352
  • 1
  • 7
  • 19