I have a question to ask:
How exactly are different L1 and L2 regularization terms on weights in xgboost algorithm.
As I understand, L1 is used by LASSO and L2 is used by RIDGE regression and L1 can shrink to 0, L2 can't. I understand the mechanics when using simple linear regression, but I have no clue how it works in tree based models.
Further more, gamma is another parameter, that makes the model more conservative. How should I notice the difference between L1/L2 and gamma parameter.
I have found in documentation very little to this problem:
lambda [default=1, alias: reg_lambda]
- L2 regularization term on weights. Increasing this value will make model more conservative.
alpha [default=0, alias: reg_alpha]
- L1 regularization term on weights. Increasing this value will make model more conservative.
gamma [default=0, alias: min_split_loss]
- Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be.
All of them ranging from 0 to inf.
Thanks in advance for any answer/comment!