Results not matching between xgb.train and xgb.XGBRegressor in Python

Question

I noticed that there are two possible implementations of XGBoost in Python as discussed here

When I tried running the same dataset through the two possible implementations I noticed that the results were different.

Using the low level API - xgboost.train(..)

dtrain = xgboost.DMatrix(X, label=Y, missing=0.0)
param = {'max_depth' : 3, 'objective' : 'reg:squarederror', 'booster' : 'gbtree'}
evallist = [(dtrain, 'eval'), (dtrain, 'train')]
num_round = 10
xgb_dMatrix = xgboost.train(param, dtrain, num_round, evallist)

Output

[0] eval-rmse:7115.31   train-rmse:7115.31
[1] eval-rmse:5335.37   train-rmse:5335.37
[2] eval-rmse:4054.77   train-rmse:4054.77
[3] eval-rmse:3140.91   train-rmse:3140.91
[4] eval-rmse:2510.33   train-rmse:2510.33
[5] eval-rmse:2080.62   train-rmse:2080.62
[6] eval-rmse:1785.53   train-rmse:1785.53
[7] eval-rmse:1571.92   train-rmse:1571.92
[8] eval-rmse:1399.57   train-rmse:1399.57
[9] eval-rmse:1301.64   train-rmse:1301.64

Using the Scikit Wrapper - xgboost.XGBRegressor(..)

xgb_reg = xgboost.XGBRegressor(max_depth=3, n_estimators=10)
xgb_reg.fit(X_train, Y_train, eval_set = [(X_train, Y_train), (X_train, Y_train)], eval_metric = 'rmse', verbose=True)

Output

[0] validation_0-rmse:8827.63   validation_1-rmse:8827.63
[1] validation_0-rmse:8048.16   validation_1-rmse:8048.16
[2] validation_0-rmse:7349.83   validation_1-rmse:7349.83
[3] validation_0-rmse:6720.69   validation_1-rmse:6720.69
[4] validation_0-rmse:6154.82   validation_1-rmse:6154.82
[5] validation_0-rmse:5637.49   validation_1-rmse:5637.49
[6] validation_0-rmse:5173.9    validation_1-rmse:5173.9
[7] validation_0-rmse:4759.14   validation_1-rmse:4759.14
[8] validation_0-rmse:4386.29   validation_1-rmse:4386.29
[9] validation_0-rmse:4051.63   validation_1-rmse:4051.63

I thought the parameters were the cause for the difference so I fetched the parameters from the scikit wrapper implementation and passed it to the low level API implementation and still observed that the results were different. Code for parameters

xgb_reg.get_params()

Just wondering what could be the possible reason why the results are not matching between the two versions which internally are similar?

Please check the duplicate question marked above. There is a difference that the `xgb_reg.get_params()` cannot handle. Please let me know if you have tried that already and still not matching and also update your code with [a reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) so that I can reopen this. — Vivek Kumar, Dec 03 '19 at 12:34
The default parameters are different. For example learning_rate is 0.1 in Scikit-learn and 0.3 in the API. This happens with more parameters. Try to fix them. — user2874583, Dec 03 '19 at 15:51
Thanks, once I matched learning rate and max depth I was able to match the data for the small dataset. But having some trouble matching the results for the larger datasets like the boston dataset [here](https://stackoverflow.com/questions/59395651/difference-is-value-between-xgb-train-and-xgb-xgbregressor-in-python-for-certain) — Allen, Dec 18 '19 at 15:53

Results not matching between xgb.train and xgb.XGBRegressor in Python

0 Answers0