I try to use xgboost in R to get rules (gbtree) from my data, so I can use the rules in an other system (not predicted data with 'predict'). The Input-Data have appr. 1500 colums and 40 Mio rows with binary, sparse data and the Label is a binary column, too.
library(xgboost)
library(Matrix)
labels <- data.frame(labels = sample.int(2, m*1, TRUE)-1L)
observations <- Matrix(as.matrix(data.frame(feat_01=sample.int(2, size=100, T) -1,
feat_02=sample.int(2, size=100, T) -1,
feat_03=sample.int(2, size=100, T) -1,
feat_04=sample.int(2, size=100, T) -1,
feat_05=sample.int(2, size=100, T) -1)), sparse=T)
dtrain <- xgb.DMatrix(data = observations, label = labels$labels)
bstResult <- xgb.train(data = dtrain,
nthread = 1,
nround = 4,
max_depth = 3,
verbose = T,
objective = "binary:logistic",
booster='gbtree')
xgb.dump(bstResult)
xgb.plot.tree(model = bstResult, n_first_tree = 2)
I visualize the data as xgb.dump or xgb.plot.tree. But I need the data in a form like:
rule1: feat_01 == 1 & feat_02==1 & feat_03== 0 --> Label = 1
rule2: feat_01== 0 & feat_03==1 & feat_04== 1 --> Label = 0
Is this possible or am I on the wrong track?
Regards Heiko
edit: added example and tried to make the question better