I am new to R programming language and I need to run "xgboost" for some experiments. The problem is that I need to cross-validate the model and get the accuracy and I found two ways that give me different results:
With "caret" using:
library(mlbench)
library(caret)
library(caretEnsemble)
dtrain <- read.csv("student-mat.csv", header=TRUE, sep=";")
formula <- G3~.
dtrain$G3<-as.factor(dtrain$G3)
control <- trainControl(method="cv", number=10)
seed <- 10
metric <- "Accuracy"
fit.xgb <- train(formula, data=dtrain, method="xgbTree", metric=metric, trControl=control, nthread =4)
fit.xgb
fit.xgbl <- train(formula, data=dtrain, method="xgbLinear", metric=metric, trControl=control, nthread =4)
fit.xgbl
Using the "xgboost" package and the following code:
library(xgboost)
printArray <- function(label, array){
cat(paste(label, paste(array, collapse = ", "), sep = ": \n"), "\n\n")
setwd("D:\\datasets")
dtrain <- read.csv("moodle7original(AtributosyNotaNumericos).csv", header=TRUE, sep=",")
label <- as.numeric(dtrain[[33]])
data <- as.matrix(sapply(dtrain, as.numeric))
croosvalid <-
xgb.cv(
data = data,
nfold = 10,
nround = 10,
label = label,
prediction = TRUE,
objective = "multi:softmax",
num_class = 33
)
print(croosvalid)
printArray("Actual classes", label[label != croosvalid\$pred])
printArray("Predicted classes", croosvalid\$pred[label != croosvalid\$pred])
correctlyClassified <- length(label[label == croosvalid\$pred])
incorrectlyClassified <- length(label[label != croosvalid\$pred])
accurancy <- correctlyClassified * 100 / (correctlyClassified + incorrectlyClassified)
print(paste("Accurancy: ", accurancy))
But the results differ very much on the same dataset. I usually get 99% accuracy on student performance dataset with the second snip of code and ~63% with the first one...
I set the same seed on both of them.
Am I wrong with the second? Please tell me why if so!