6

I have a very simple dataset (30 rows, 32 columns).

I wrote a Python program to load the data and train an XGBoost model, then save the model to disk.

I also compiled a C++ program that uses libxgboost (C api) and loads the model for inference.

When using the SAME saved model, Python and C++ give different results for the same input (a single row of all zeros).

xgboost is 0.90 and I have attached all files (including the numpy data files) here:

https://www.dropbox.com/s/txao5ugq6mgssz8/xgboost_mismatch.tar?dl=0

Here are the outputs of the two programs (the source of which are in the .tar file):

The Python program

(which prints a few strings while building the model and THEN prints the single number output)

$ python3 jl_functions_tiny.py
Loading data
Creating model
Training model
Saving model
Deleting model
Loading model
Testing model
[587558.2]

The C++ program

(which emits a single number that clearly doesn't match the single Python number output)

$ ./jl_functions
628180.062500
Community
  • 1
  • 1
user5406764
  • 1,627
  • 2
  • 16
  • 23
  • what does "Deleting model" mean? deleting from memory, but still exists in disk? – Shihab Shahriar Khan Oct 15 '19 at 09:09
  • Exactly. This is actually extraneous because of Pythons garbage collector, but I explicitly deleted it to demonstrate that the newly loaded model comes only from disk and nowhere else. – user5406764 Oct 15 '19 at 10:38
  • Wondering if you resolved this. Saving a model as json with 1.1.1 and loading in C++ also gives different results on a model I'm trying. – Nick Jun 14 '20 at 00:04

2 Answers2

1

different seed parameter in python and in C++ can cause different result's since there a usage in randomness sin the algorithm , try to set seed= in line 11 xgb.XGBregressor same in python and in C++ or even via numpy using numpy.random.seed(0) and in C++ the seed parameter from /workspace/include/xgboost/generic_parameters.h

Omer Anisfeld
  • 1,236
  • 12
  • 28
0

a) You are saving your model as model.save which has issues with feature vector ordering you could try it with model.dump xgboost load model in c++ (python -> c++ prediction scores mismatch)

b) Please check your python code that you are not using sparse matrix to create model-my intuition says problem is here

Disclaimer: I am not expert or any good in c++ but what I figured out this might be the reason for non matching predictions and I don't have any environment handy to test your C++ and share results.

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
nithin
  • 753
  • 3
  • 7
  • 21
  • I dumped the model in Python (model_dump) and in C++ (XGBoosterDumpModel). The results were _exactly_ the same, which is a good sign that they are both able to read in the model the same way. I also re-checked the code (please see the tarball in the question for reference) and I'm definitely not using a sparse matrix. Thanks for both ideas. But even if the features were in a different order, the C++ and Python code are both feeding in all zeros to the model, so they should still have the same output. So I think while your ideas are very awesome, I am still stuck. – user5406764 Oct 17 '19 at 12:04