I have trained an XGB model on some labelled customer payment data, with the aim of predicting future payment behavior in two classes, as a 2 level factor.
XGB.prediction <- predict(object = XGB,
newdata = df.test,
type = "prob")
df.test is a dataframe consisting of 2534 obs. 43 variables.
XBG.prediction therefore, I expect to be 2534 obs. of 2 variables, and their probabilities. However, there are only 1416 obs.
I have tried to determine if NA values could have resulted in this
> anyNA(df.test$Class)
[1] FALSE
This creates issues when trying to evaluate my model through ROC.
> xgb.roc <- roc(response = df.test$Class,
auc = TRUE,
plot = TRUE,
predictor = XGB.prediction[,"payer"])
Error in roc.default(response = df.test$Class, auc = TRUE, plot = TRUE, :
Response and predictor must be vectors of the same length.
the model training parameters are as follows
XGB <- train(
Class ~ .,
data = df.train,
trControl = ctrl,
method = "xgbTree",
tuneGrid = grid.xgboost,
importance = 'impurity',
metric = "ROC")