XgBoost accuracy results differ on each run, with the same parameters. How can I make them constant?

Question

The 'merror' and 'logloss' result from XGB multiclass classification differs by about 0.01 or 0.02 on each run, with the same parameters. Is this normal?

I want 'merror' and 'logloss' to be constant when I run XGB with the same parameters so I can evaluate the model precisely (e.g. when I add a new feature).

Now, if I add a new feature I can't really tell whether it had a positive impact on my model's accuracy or not, because my 'merror' and 'logloss' differ on each run regardless of whether I made any changes to the model or the data fed into it since the last run.

Should I try to fix this and if I should, how can I do it?

It must be something to do with random numbers. Probably with how the initial weights are set... Anyhow setting a constant seed will do the trick. I don't know how to do that with XgBoost unfortunately. It shouldn't be too hard to find out though. Good luck! — Hadus, Jun 04 '18 at 20:25
Something like: https://stackoverflow.com/a/21494630/6304086 — Hadus, Jun 04 '18 at 20:27
When you create an instance of the classifier there should be an argument called `random_state` that you should set to a number. — Hadus, Jun 04 '18 at 20:34

score 0 · Answer 1 · answered Jun 06 '18 at 15:02

Managed to solve this. First I set the 'seed' parameter of XgBoost to a fixed value, as Hadus suggested. Then I found out that I used sklearn's train_test_split function earlier in the notebook, without setting the random_state parameter to a fixed value. So I set the random_state parameter to 22 (you can use whichever integer you want) and now I'm getting constant results.

XgBoost accuracy results differ on each run, with the same parameters. How can I make them constant?

1 Answers1