87

From the XGBoost guide:

After training, the model can be saved.

bst.save_model('0001.model')

The model and its feature map can also be dumped to a text file.

# dump model
bst.dump_model('dump.raw.txt')
# dump model with feature map
bst.dump_model('dump.raw.txt', 'featmap.txt')

A saved model can be loaded as follows:

bst = xgb.Booster({'nthread': 4})  # init model
bst.load_model('model.bin')  # load data

My questions are following.

  1. What's the difference between save_model & dump_model?
  2. What's the difference between saving '0001.model' and 'dump.raw.txt','featmap.txt'?
  3. Why the model name for loading model.bin is different from the name to be saved 0001.model?
  4. Suppose that I trained two models: model_A and model_B. I wanted to save both models for future use. Which save & load function should I use? Could you help show the clear process?
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Pengju Zhao
  • 1,439
  • 3
  • 14
  • 17
  • 1
    you've asked a bunch of questions but the code for `save_model`, `dump_model` and `load_model` to look into if you're interested is here: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py – Max Power Apr 29 '17 at 03:31
  • If your XGBoost model is trained with sklearn wrapper, you still can save the model with "bst.save_model()" and load it with "bst = xgb.Booster().load_model()". When you use 'bst.predict(input)', you need to convert your input into DMatrix. – Jundong Nov 16 '18 at 18:50
  • I use `joblibs` more. For related discussion, see [pickle vs joblibs](https://stackoverflow.com/questions/12615525/what-are-the-different-use-cases-of-joblib-versus-pickle) and [sklearn guide for saving model](https://scikit-learn.org/stable/tutorial/basic/tutorial.html#model-persistence) – Travis Mar 26 '20 at 01:57

5 Answers5

49

Here is how I solved the problem:

import pickle
file_name = "xgb_reg.pkl"

# save
pickle.dump(xgb_model, open(file_name, "wb"))

# load
xgb_model_loaded = pickle.load(open(file_name, "rb"))

# test
ind = 1
test = X_val[ind]
xgb_model_loaded.predict(test)[0] == xgb_model.predict(test)[0]

Out[1]: True
desertnaut
  • 57,590
  • 26
  • 140
  • 166
ChrisDanger
  • 1,071
  • 11
  • 10
  • 29
    If your model is saved in pickle, you may lose support when you upgrade xgboost version – Fontaine007 Mar 02 '20 at 08:27
  • 6
    This is a legitimate use-case - for example, pickling is the official recommendation to save a sklearn pipeline. This necessarily means that if one has an sklearn pipeline containing an XGBoost model, they must end up pickling XGBoost. If the concern is that somewhere down the road, an update to XGBoost may break the pickle's behavior, that's why version-pinning (and unit testing) exists. – AmphotericLewisAcid May 11 '21 at 02:38
  • 1
    @AmphotericLewisAcid version pinning shows where problem is but doesnt solve it – Qbik Aug 06 '22 at 13:34
  • 4
    "Don't use pickle or joblib as that may introduces dependencies on xgboost version. The canonical way to save and restore models is by load_model and save_model." – Qbik Aug 06 '22 at 13:38
  • I disagree. Again, if you want to serialize a sklearn pipeline containing sklearn objects that implement the sklearn API, the official recommendation in the sklearn documentation is to pickle the objects. My comment is referring specifically to the case where you're using XGBoost's sklearn-compatible objects. Whatever XGBoost believes is "canonical" is irrelevant here - they're either compliant with the spec or not. And the work-around for their non-compliance is version-pinning to ensure the pickled objects work properly. – AmphotericLewisAcid Dec 11 '22 at 05:32
  • 2
    I must warn you that production models can be broken by a mere sklearn API change... if using pickles, you need to pin everything... And another argument against pickling models: Triton Inference Server (maintained by NVIDIA and available in every major public cloud) would not import a pickled tree-based model (accepts only pure Boosters saved without the version-specific python object header, i.e. as txt or json). – mirekphd Mar 07 '23 at 08:41
44

Both functions save_model and dump_model save the model, the difference is that in dump_model you can save feature name and save tree in text format.

The load_model will work with model from save_model. The model from dump_model can be used for example with xgbfi.

During loading the model, you need to specify the path where your models is saved. In the example bst.load_model("model.bin") model is loaded from file model.bin - it is just a name of file with model. Good luck!

EDIT: From Xgboost documentation (for version 1.3.3), the dump_model() should be used for saving the model for further interpretation. For saving and loading the model the save_model() and load_model() should be used. Please check the docs for more details.

There is also a difference between Learning API and Scikit-Learn API of Xgboost. The latter saves the best_ntree_limit variable which is set during the training with early stopping. You can read details in my article How to save and load Xgboost in Python?

The save_model() method recognize the format of the file name, if *.json is specified, then model is saved in JSON, otherwise it is text file.

pplonski
  • 5,023
  • 1
  • 30
  • 34
23

Don't use pickle or joblib as that may introduces dependencies on xgboost version. The canonical way to save and restore models is by load_model and save_model.

If you’d like to store or archive your model for long-term storage, use save_model (Python) and xgb.save (R).

This is the relevant documentation for the latest versions of XGBoost. It also explains the difference between dump_model and save_model.

Note that you can serialize/de-serialize your models as json by specifying json as the extension when using bst.save_model. If the speed of saving and restoring the model is not important for you, this is very convenient, as it allows you to do proper version control of the model since it's a simple text file.

user787267
  • 2,550
  • 1
  • 23
  • 32
22

An easy way of saving and loading a xgboost model is with joblib library.

import joblib
#save model
joblib.dump(xgb, filename) 

#load saved model
xgb = joblib.load(filename)
Ioannis Nasios
  • 8,292
  • 4
  • 33
  • 55
  • 18
    It's is not good if you want to load and save the model a cross languages. For example, you want to train the model in python but predict in java. – oshribr Sep 05 '18 at 09:47
  • 5
    This is the advised approach by XGB developers when you are using sklearn API of xgboost. XGBClassifier & XGBRegressor should be saved like this through pickle format. – Abhilash Awasthi Apr 15 '19 at 07:53
  • 3
    It says joblib is deprecated on python3.8 – Yi Lin Liu May 19 '19 at 03:36
  • 2
    There will be incompatibility when you saved and load as pickle over different versions of Xgboost. – dhanush-ai1990 Jul 13 '20 at 22:37
12

If you are using the sklearn api you can use the following:


xgb_model_latest = xgboost.XGBClassifier() # or which ever sklearn booster you're are using

xgb_model_latest.load_model("model.json") # or model.bin if you are using binary format and not the json

If you used the above booster method for loading, you will get the xgboost booster within the python api not the sklearn booster in the sklearn api.

So yeah, this seems to be the most pythonic way to load in a saved xgboost model data if you are using the sklearn api.

Robert Beatty
  • 508
  • 5
  • 11
  • 1
    I have used this method but not getting the parameters of the previously saved model when using ```xgb_model_latest.get_params()```. – Galo Castillo Dec 16 '20 at 16:43
  • I have the same problem. – Ravi Apr 01 '21 at 16:41
  • Not having that issue. Default values are treated in a way I don't like. But I do get the params I put in. I'm using 1.2.1. Feel free to post your code to try and work shop this. – Robert Beatty Apr 02 '21 at 18:08