2

I use xgboost package in R.

First, I want to tune the parameters with the validation set(20% of the data set). Second, I want to get model and predict to binary classification task with 5-fold cross validation. I use 64%(80%*80%) for the train set and 16%(80%*20%) for test set and iterate this five times.

First, I use xgb.cv for tuning parameters. Related questions are here and xgboost in R: how does xgb.cv pass the optimal parameters into xgb.train .

set.seed(650)
tr.num<-sample(650,130)###I have 650 samples.
data.tuning<-data[tr.num,]
data.traintest<-data[-tr.num,]

x.tune <- data.tuning[,2:9]
x.tune <- as.matrix(x.tune)
k<-round(1+log2(130))
cv.nround <- 200 #search
bst.cv <- xgb.cv(param=param, data = x.tune, label = data.tuning[,10],nfold = k, nrounds=cv.nround, metrics=list("error"), prediction = TRUE)

......

[2] train-error:0.017573+0.008109 test-error:0.108456+0.104800

[3] train-error:0.013177+0.006646 test-error:0.100643+0.100299

[4] train-error:0.008782+0.004689 test-error:0.100643+0.100299

[5] train-error:0.003299+0.004553 test-error:0.100643+0.100299

[6] train-error:0.000000+0.000000 test-error:0.100643+0.100299

[7] train-error:0.000000+0.000000 test-error:0.108456+0.104800

[8] train-error:0.000000+0.000000 test-error:0.107996+0.086933

......

I selected nround = 7 becase of the minimun test-error.

Second, I use xgb.cv again for 5-fold cross validation in order to get the model and to know the precision and recall. But how should I do?

x.traintest <- data.traintest[,2:9]
x.traintest <- as.matrix(x.traintest)
bst.cv <- xgb.cv(param=param, data = x.traintest, label = data.traintest[,10], nrounds=7, nfold = 5)

test <- 1:104 ###650*0.16 = 104
train <- 105:520

y.traintest <- data.traintest[,10]
y.traintest <- as.matrix(y.traintest)

bst <- xgboost(param=param, data = x.traintest[train,], label=y.traintest[train,], nrounds=7, nfold = 5)
pred <- predict(bst,x.traintest[test,])
for(i in 1:length(pred)){
  if(pred[i] > 0.5) {pred[i]="case"}
  else {pred[i]="no"}
}
table(y.traintest[test,],pred)

Is this 5-fold cross validation and prediction? I want to get the average recall and precision of 5-fold cross validation. How should I do? I don't understand how to use PREDICTION = TRUE also.

Related questions is here, here, and here.

Do I misunderstand about cross validation or gradient boosting?

Community
  • 1
  • 1
rrkk
  • 437
  • 1
  • 5
  • 15

0 Answers0