1

The 'merror' and 'logloss' result from XGB multiclass classification differs by about 0.01 or 0.02 on each run, with the same parameters. Is this normal?

I want 'merror' and 'logloss' to be constant when I run XGB with the same parameters so I can evaluate the model precisely (e.g. when I add a new feature).

Now, if I add a new feature I can't really tell whether it had a positive impact on my model's accuracy or not, because my 'merror' and 'logloss' differ on each run regardless of whether I made any changes to the model or the data fed into it since the last run.

Should I try to fix this and if I should, how can I do it?

Ian Dzindo
  • 197
  • 1
  • 5
  • 12
  • It must be something to do with random numbers. Probably with how the initial weights are set... Anyhow setting a constant seed will do the trick. I don't know how to do that with XgBoost unfortunately. It shouldn't be too hard to find out though. Good luck! – Hadus Jun 04 '18 at 20:25
  • Something like: https://stackoverflow.com/a/21494630/6304086 – Hadus Jun 04 '18 at 20:27
  • 1
    When you create an instance of the classifier there should be an argument called `random_state` that you should set to a number. – Hadus Jun 04 '18 at 20:34

1 Answers1

0

Managed to solve this. First I set the 'seed' parameter of XgBoost to a fixed value, as Hadus suggested. Then I found out that I used sklearn's train_test_split function earlier in the notebook, without setting the random_state parameter to a fixed value. So I set the random_state parameter to 22 (you can use whichever integer you want) and now I'm getting constant results.

Ian Dzindo
  • 197
  • 1
  • 5
  • 12