Could someone explain how the Cover
column in the xgboost
R package is calculated in the xgb.model.dt.tree
function?
In the documentation it says that Cover "is a metric to measure the number of observations affected by the split".
When you run the following code, given in the xgboost
documentation for this function, Cover
for node 0 of tree 0 is 1628.2500.
data(agaricus.train, package='xgboost')
#Both dataset are list with two items, a sparse matrix and labels
#(labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)
There are 6513 observations in the train dataset, so can anyone explain why Cover
for node 0 of tree 0 is a quarter of this number (1628.25)?
Also, Cover
for the node 1 of tree 1 is 788.852 - how is this number calculated?
Any help would be much appreciated. Thanks.