0

I created an XGBoost model for classification of a binary variable in a dataset I named 'heart.' The problem is this model is giving 100 percent accuracy, which leads me to believe I have done something wrong. I coded the training, testing sets, and model as follows, where my response variable was in column 12. I'm new to this model and am assuming there are probably some obvious problems with my code, so sorry about that, but can anyone help explain to me why I am getting 100 percent accuracy and what I can do to fix the code? Also, I tried the same process on several other datasets, and still got 100 percent accuracy every time, telling me that something must be wrong. Any help would be greatly appreciated. Thanks

training<-sample(1:3999, floor(0.6*3999))
head(training)
train.df<-heart[training,]
validation<-setdiff(c(1:3999),training)
valid.df<-heart[validation,]

train.df<-as.matrix(train.df)
valid.df<-as.matrix(valid.df)
bst<-xgboost(data = train.df, label = train.df[,12], max.depth = 2,
               eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")

pred <- predict(bst, valid.df)
pred<-ifelse(pred > 0.5, 1, 0)

mean(pred == valid.df[,12])
S J
  • 31
  • 4
  • Hi Samuel, it is always helpful on StackOverflow to provide a reproducible example so others can reproduce the problem and propose a solution that works for your dataset. Please see https://stackoverflow.com/help/minimal-reproducible-example and https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – Reilstein Mar 01 '21 at 01:28
  • It seems the target(`label`) variable is being used one of the input variables. Try removing it, as in `bst<-xgboost(data = train.df[, -12], label = train.df[,12],...)` – kangaroo_cliff Mar 01 '21 at 03:08
  • OK thank you Suren, that was the problem! – S J Mar 01 '21 at 16:34

0 Answers0