Using XGBOOST in c++

Question

How can I use XGBOOST https://github.com/dmlc/xgboost/ library in c++? I have founded Python and Java API, but I can't found API for c++

Yes, i have read an installation guide, but i can't find example of using xgboost with c++ — V. Gai, Mar 17 '16 at 21:12
Try to use this link: https://stackoverflow.com/questions/49744351/xgboost-prediction-is-different-for-c-and-python-for-the-same-model — , Dec 10 '18 at 20:11

Tomer · Accepted Answer · 2016-04-15T16:03:14.780

31

I ended up using the C API, see below an example:

// create the train data
int cols=3,rows=5;
float train[rows][cols];
for (int i=0;i<rows;i++)
    for (int j=0;j<cols;j++)
        train[i][j] = (i+1) * (j+1);

float train_labels[rows];
for (int i=0;i<rows;i++)
    train_labels[i] = 1+i*i*i;


// convert to DMatrix
DMatrixHandle h_train[1];
XGDMatrixCreateFromMat((float *) train, rows, cols, -1, &h_train[0]);

// load the labels
XGDMatrixSetFloatInfo(h_train[0], "label", train_labels, rows);

// read back the labels, just a sanity check
bst_ulong bst_result;
const float *out_floats;
XGDMatrixGetFloatInfo(h_train[0], "label" , &bst_result, &out_floats);
for (unsigned int i=0;i<bst_result;i++)
    std::cout << "label[" << i << "]=" << out_floats[i] << std::endl;

// create the booster and load some parameters
BoosterHandle h_booster;
XGBoosterCreate(h_train, 1, &h_booster);
XGBoosterSetParam(h_booster, "booster", "gbtree");
XGBoosterSetParam(h_booster, "objective", "reg:linear");
XGBoosterSetParam(h_booster, "max_depth", "5");
XGBoosterSetParam(h_booster, "eta", "0.1");
XGBoosterSetParam(h_booster, "min_child_weight", "1");
XGBoosterSetParam(h_booster, "subsample", "0.5");
XGBoosterSetParam(h_booster, "colsample_bytree", "1");
XGBoosterSetParam(h_booster, "num_parallel_tree", "1");

// perform 200 learning iterations
for (int iter=0; iter<200; iter++)
    XGBoosterUpdateOneIter(h_booster, iter, h_train[0]);

// predict
const int sample_rows = 5;
float test[sample_rows][cols];
for (int i=0;i<sample_rows;i++)
    for (int j=0;j<cols;j++)
        test[i][j] = (i+1) * (j+1);
DMatrixHandle h_test;
XGDMatrixCreateFromMat((float *) test, sample_rows, cols, -1, &h_test);
bst_ulong out_len;
const float *f;
XGBoosterPredict(h_booster, h_test, 0,0,&out_len,&f);

for (unsigned int i=0;i<out_len;i++)
    std::cout << "prediction[" << i << "]=" << f[i] << std::endl;


// free xgboost internal structures
XGDMatrixFree(h_train[0]);
XGDMatrixFree(h_test);
XGBoosterFree(h_booster);

edited Apr 15 '16 at 16:03

answered Apr 14 '16 at 19:44

Tomer

549
4
9

Did you get to know how to free the `const float *f;`, when I predict large amount of data, it seems that memory is not freed. I looked into the code `XGDMatrixFree(h_test)` should do it, but still the memory leak increases with the size h_test! – Khaledvic May 01 '17 at 15:12
Sounds like the leak is elsewhere, did you confirm with Valgrind? – Tomer May 02 '17 at 15:58
1

apparently `XGBoosterPredict` isn't thread safe, I was calling it from a large number of threads, https://github.com/dmlc/xgboost/issues/311 – Khaledvic May 03 '17 at 17:30
How did you install the library for c++? Also, what `#include`s are you using? @Tomer @Khaledvic – Meet Taraviya May 30 '17 at 13:06
use `#include ` don't forget to link the built xgboost libs in your make file `LDLIBSOPTIONS=../xgboost/lib/libxgboost.a ../xgboost/rabit/lib/librabit.a ../xgboost/dmlc-core/libdmlc.a` and off course add xgboost and rabit to the include paths (gcc command) `-I../xgboost/include -I../xgboost/rabit/include` – Khaledvic May 30 '17 at 16:09
I made a makefile like this: `LDLIBSOPTIONS=xgboost/lib/libxgboost.a xgboost/rabit/lib/librabit.a CFLAGS=-I xgboost/include -I xgboost/rabit/includemain.cpp all: g++ main.cpp $(CFLAGS) $(LDLIBSOPTIONS)` main.cpp and xgboost folder are in same directory as `Makefile`. I am getting a linking error. What am I doing wrong? – Meet Taraviya May 31 '17 at 09:36
btw I am using linux – Meet Taraviya May 31 '17 at 09:39
on osx, added includes for ```stdint.h```, ```xgboost/c_api.h```, and ```iostream``` and successfully compiled with ```g++-6 -I../../xgboost/include -I../../xgboost/rabit/include test_xgboost.cpp ../../xgboost/lib/libxgboost.a ../../xgboost/rabit/lib/librabit.a ../../xgboost/dmlc-core/libdmlc.a -fopenmp``` – mwag Sep 08 '17 at 01:15

Yan · Answer 2 · 2018-02-25T11:52:36.480

Use XGBoost C API.

  BoosterHandle booster;
  const char *model_path = "/path/of/model";

  // create booster handle first
  XGBoosterCreate(NULL, 0, &booster);

  // by default, the seed will be set 0
  XGBoosterSetParam(booster, "seed", "0");

  // load model
  XGBoosterLoadModel(booster, model_path);

  const int feat_size = 100;
  const int num_row = 1;
  float feat[num_row][feat_size];

  // create some fake data for predicting
  for (int i = 0; i < num_row; ++i) {
    for(int j = 0; j < feat_size; ++j) {
      feat[i][j] = (i + 1) * (j + 1)
    }
  }

  // convert 2d array to DMatrix
  DMatrixHandle dtest;
  XGDMatrixCreateFromMat(reinterpret_cast<float*>(feat),
                         num_row, feat_size, NAN, &dtest);

  // predict
  bst_ulong out_len;
  const float *f;
  XGBoosterPredict(booster, dtest, 0, 0, &out_len, &f);
  assert(out_len == num_row);
  std::cout << f[0] << std::endl;

  // free memory
  XGDMatrixFree(dtest);
  XGBoosterFree(booster);

Note when you want to load an existing model(like above code shows), you have to ensure the data format in training is the same as in predicting. So, if you predict with XGBoosterPredict, which accepts a dense matrix as parameter, you have to use dense matrix in training.

Training with libsvm format and predict with dense matrix may cause wrong predictions, as XGBoost FAQ says:

“Sparse” elements are treated as if they were “missing” by the tree booster, and as zeros by the linear booster. For tree models, it is important to use consistent data formats during training and scoring.

score 2 · Answer 3 · answered May 22 '20 at 09:58

Here is what you need：https://github.com/EmbolismSoil/xgboostpp

#include "xgboostpp.h"
#include <algorithm>
#include <iostream>

int main(int argc, const char* argv[])
{
    auto nsamples = 2;
    auto xgb = XGBoostPP(argv[1], 3); //特征列有4列, label有3个, iris例子中分别为三种类型的花，回归任何的话，这里nlabel=1即可

    //result = array([[9.9658281e-01, 2.4966884e-03, 9.2058454e-04],
    //       [9.9608469e-01, 2.4954407e-03, 1.4198524e-03]], dtype=float32)
    XGBoostPP::Matrix features(2, 4);
    features <<
        5.1, 3.5, 1.4, 0.2,
        4.9, 3.0, 1.4, 0.2;

    XGBoostPP::Matrix y;
    auto ret = xgb.predict(features, y);
    if (ret != 0){
        std::cout << "predict error" << std::endl;
    }

    std::cout << "intput : \n" << features << std::endl << "output: \n" << y << std::endl;
}

score 1 · Answer 4 · answered Sep 22 '20 at 13:33

In case training in Python is okay and you only need to run the prediction in C++, there is a nice tool for generating static if/else-code from a trained model:

https://github.com/popcorn/xgb2cpp

I ended up using this after spending a day trying to load and use a xgboost model in C++ without success. The code generated by xgb2cpp was working instantly and also has the nice benefit that it does not have any dependencies.

score 0 · Answer 5 · answered Apr 07 '16 at 09:24

There is no example I am aware of. there is a c_api.h file that contains a C/C++ api for the package, and you'll have to find your way using it. I've just did that. Took me a few hours reading the code and trying few things out. But eventually I managed to create a working C++ example of xgboost.

score 0 · Answer 6 · answered May 31 '17 at 13:21

0

To solve this problem we runs the xgboost program from C++ source code.

answered May 31 '17 at 13:21

V. Gai

450
3
9
30

Using XGBOOST in c++

6 Answers6

Linked