I have this problem with xgboost I use at work. My task is to port a piece of code that's currently running in R to python.
What the code does: My aim is to use XGBoost to determine the features with most gain. I made sure the inputs into the XGBoost are identical in R and python. The XGBoost is run roughly 100 times (on different data) and each time I extract 30 best features by gain.
My problem is this: The input in R and python are identical. Yet python and R output vastly different features(both in terms of total number of features per round, and which features are chosen). They only share about 50 % of features. My parameters are the same, and I don't use any samples, so there is no randomness.
Also, another thing I noticed - XGBoost is slower in python when compared to R with the same parameters. Is it a known issue?
I've been trying to look around, but didn't find anyone having a similar problem. I can't share the data or code, because it's confidential. Does someone have an idea why the features differ so much?
R version: 3.4.3
XGBoost R version: 0.6.4.1
python version: 3.6.5
XGBoost python version: 0.71
Running on Windows.