Im trying to understand how a xgboost works for a multiclass problem. I have used the IRIS dataset to predict which species an input belongs to based on its characteristics and computed results in R.
The code is below
test <- as.data.frame(iris)
test$y <- ifelse(test$Species=="setosa",0,
(ifelse(test$Species=="versicolor",1,
(ifelse(test$Species=="virginica",2,3)))))
x_iris <- test[,c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")]
y_iris <- test[,"y"]
iris_model <- xgboost(data = data.matrix(x_iris), label = y_iris, eta = 0.1, base_score = 0.5, nround=1,
subsample = 1, colsample_bytree = 1, num_class = 3, max_depth = 4, lambda = 0,
eval_metric = "mlogloss", objective = "multi:softprob")
xgb.plot.tree(model = iris_model, feature_names = colnames(x_iris))
I tried to manually compute the results and compare the gain and cover value with the R output. I have noticed a couple of things:
- The initial probability is always 1/(num of classes) irrespective of what we provide in the ‘base_score’ parameter in R. The 'base_score' actually gets added at the end, to the final log_odds value and it matches with the R output when we run the predict function to get log of odds. In the case of binary classification, the ‘base_score’ parameter is used as initial probability for the model.
predict(iris_model,data.matrix(x_iris), reshape = TRUE, outputmargin = FALSE)
- The loss function is (2.0f * p * (1.0f - p) * wt) for multiclass problems and (p * (1.0f - p) * wt) for binary problems.
There is an explanation for loss function in the github repo https://github.com/dmlc/xgboost/issues/638 , but no info on why the base_score gets added at the end.
Is it because the algorithm in R was designed this way or does the XGBoost multiclass algorithm work like this?