0

I am using the Learning API version of xgboost. I want to get the coefficients of a linear model using this, but it results in an error AttributeError: 'Booster' object has no attribute 'coef_'. The Learning API documentation doesn't appear to address how to retrieve coefficients.

###xtrain and ytrain are numpy arrays
dtrain = xgb.DMatrix(xtrain, label=ytrain)
dtest = xgb.DMatrix(xtest, label=ytest)
param = {'eta':0.3125, 'objective': 'binary:logistic' 'nthread':8, 'eval_metric':'auc', 'booster':'gblinear', 'max_depth':12}
model = xgb.train(param, dtrain, 60, [(dtrain, 'train'), (dtest, 'eval')], verbose_eval = 5, early_stopping_rounds = 12)
print(model.coef_) #results in an error

I tried building an equivalent version of the above model using XGBRegressor as it does have the attribute coef_, but this model does return predictions that are very different. I looked at previous answers on this topic (1, 2), which seem to imply that n_estimators is effectively the same as num_boost_round and that would provide the same predictions. But despite accounting for this, the predictions are very different based on the parameters below. This model turns out to be extremely conservative. Also, from the documentation, nthread is the same as n_jobs. I don't see any other differences in the parameters of the two.

model = XGBRegressor(n_estimators = 60, learning_rate = 0.3125, max_depth = 12, objective = 'binary:logistic', booster = 'gblinear', n_jobs = 8)
model = model.fit(xtrain, ytrain, eval_metric = 'auc', early_stopping_rounds = 12, eval_set = [(xtest, ytest)])
predictions = model.predict(xtrain, ntree_limit = 0) # need to include ntree_limit because of bug associated with early_stopping_rounds for gblinear

My questions are:

  1. Is there a way to get coefficients for a model built using xgb.train for a linear model, and if so how may I do it?
  2. If not, why is XGBRegressor giving me different results?
Jojo
  • 1,117
  • 2
  • 15
  • 28

1 Answers1

0

For first question you can get weights by dumping the model to file:

bst.dump_model('path/dump.raw.txt')

Unlike sklearn dump, dumping here used for interpretation, so these weights cannot be loaded back to xgboost.

Then dump.raw.txt will look like:

booster[0]:
bias:
0.0102652
weight:
-0.000597852
0.0400338
-0.00014682
0.00499299
0.0111505
-0.092625
-0.0132113
-0.00796503
0.00351845
0.00833504
0.0219131
-0.00388152
-0.000771679
-0.00585201
0.00893034
-0.00267784
-0.000711578
-0.00535324
-0.0062664
-0.00439571

For 2. question from the docs(xgboost-linear-model-params):

"Using gblinear booster with shotgun updater is nondeterministic as it uses Hogwild algorithm."

So shotgun updater causes non-deterministic results for different runs. To get determinism you can set updater as follows in params:

'updater':'coord_descent'

then your params will look like as:

{'updater':'coord_descent', 'eta':0.3125, 'objective': 'binary:logistic' 'nthread':8, 'eval_metric':'auc', 'booster':'gblinear', 'max_depth':12}

Then I noticed that the model produced by xgb.train() has a predict() method which is compatible with XGBRegressor's predict_proba method so you have to compare results regarding this.

artunc
  • 150
  • 1
  • 7
  • Thanks. That does help with the first question. But with the second question, adding `'updater':'coord_descent'` to the params of the model from `xgb.train()`, still doesnt appear to get the two sets of predictions close. – Jojo Feb 08 '21 at 21:56
  • Did you compare the weights? – artunc Feb 08 '21 at 22:21
  • I can tell that the difference is caused by different implementations (you are comparing different apis), sampling and nature of ensemble models. However, I dont expect a high deviation between the 2 result set despite the difference between models. – artunc Feb 08 '21 at 22:34
  • Thanks, but the `xgb.train()` model predicts values that are about twice the value of `XGBRegressor()` for these parameters. It seems like the predictions for `XGBRegressor()` are overly conservative as well. – Jojo Feb 09 '21 at 01:03
  • Also, based on the links I provided in the question (i.e. [1](https://stackoverflow.com/questions/46943674/how-to-get-predictions-with-xgboost-and-xgboost-using-scikit-learn-wrapper-to-ma), [2](https://datascience.stackexchange.com/questions/17282/xgbregressor-vs-xgboost-train-huge-speed-difference)) it does appear that replicability across the apis is possible. – Jojo Feb 09 '21 at 02:25
  • I updated the answer so one's predict method is predict_probab for other, as a result you think that you see different predictions. – artunc Feb 09 '21 at 09:43