I've trained a model in python using the following code(I didnt use a testing set for this example, I was training and predicting using the same dataset, to make the illustration of the problem easier):
params = {'learning_rate':0.1,'obj':'binary:logistic','n_estimators':250, 'scale_pos_weight':0.2, 'max_depth' : 15, 'min_weight' : 1, 'colsample_bytree' : 1, 'gamma' : 0.1, 'subsample':0.95}
X = np.array(trainingData,dtype = np.uint32) #training data was generated from a csv
X = xgb.DMatrix(np.asmatrix(X), label = Y)
clf = xgb.train(params, X)
clf.save_model('xgb_test.model')
X.save_binary('test.buffer')
answer = clf.predict(X)
The Prediction generated around 40k zeros and 270k ones
The model is then loaded into c++ with the following code:
const char * fileName = "blahblah/xgb_test.model";
int x = XGBoosterLoadModel(handle, fileName);
if (x == 0) {
printf("Successfully Loaded Model\n");
}
const char * predictionData = "blahblah/test.buffer";
x = XGDMatrixCreateFromFile(predictionData, 0, &dHandle);
if (x == 0) {
printf("Successfully Loaded Data\n");
}
bst_ulong out2;
const float *m_TestResults2;
x = XGBoosterPredict(handle, dHandle, 0, 1, &out2, &m_TestResults2);
if (x == 0) {
printf("Successful Prediction\n");
}
int zeroCount = 0;
int oneCount = 0;
for (int i = 0; i < out2; i++) {
if (m_TestResults2[i] < 0.5) {
zeroCount++;
}
else {
oneCount++;
}
}
printf("Number of Zeroes: " + zeroCount);
printf("Number of Ones: " + oneCount);
For the c++ prediction I've obtained around 55k zeroes.
I have tried the following:
- Make sure that the model is trained in python using a dense matrix, since xgboosterpredict takes in a dense matrix(arrived at this assumption from a similar question on stackoverflow)
- Use xgb.train, instead of Xgbclassifier.fit. Train takes in dmatrix, fit does not
- Convert training data to np matrix using np.asmatrix(X).
Anyone have any ideas what I've done wrong? Thanks