2

I was trying to use xgboost for classification of the iris data, but face this error.

"Error in frankv(predicted) : x is a list, 'cols' can not be 0-length In addition: Warning message: In train.default(x_train, y_train, trControl = ctrl, tuneGrid = xgbgrid, : cannnot compute class probabilities for regression"

I am using the following code. Any help or explanation will be highly appreciated.

data(iris)
library(caret)
library(dplyr)
library(xgboost)

set.seed(123)
index <- createDataPartition(iris$Species, p=0.8, list = FALSE)
trainData <- iris[index,]
testData <- iris[-index,]


x_train = xgb.DMatrix(as.matrix(trainData %>% select(-Species)))
y_train = as.numeric(trainData$Species)



#### Generic control parametrs
ctrl <- trainControl(method="repeatedcv", 
                    number=10, 
                    repeats=5,
                    savePredictions=TRUE, 
                    classProbs=TRUE,
                    summaryFunction = twoClassSummary)

xgbgrid <- expand.grid(nrounds = 10,
                    max_depth = 5,
                    eta = 0.05,
                    gamma = 0.01,
                    colsample_bytree = 0.75,
                    min_child_weight = 0,
                    subsample = 0.5,
                    objective = "binary:logitraw",
                    eval_metric = "error")


set.seed(123)
xgb_model = train(x_train, 
                y_train,  
                trControl = ctrl,
                tuneGrid = xgbgrid,
                method = "xgbTree")
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
ABS2019
  • 85
  • 1
  • 9
  • Take a look at [this](https://stackoverflow.com/questions/23737137/r-caret-train-error-in-evalsummaryfunction-cannnot-compute-class-probabilities). – NelsonGon Jul 15 '19 at 16:05
  • Take a look at this line: `y_train = as.numeric(trainData$Species)`. Also using the `twoClassSummary` function will not be appropriate since Species has three levels. Fix these two and you're good to go. Use `multiClassSummary` instead. Functions in this comment may not be in the correct case(lower/upper). – NelsonGon Jul 15 '19 at 16:12
  • Thanks for identifying the error in class summary, however, I have tried to convert the y as factor by y_train <- as.factor(as.numeric(trainData$Species)), but getting this error "Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1, X2 . Please use factor levels that can be used as valid R variable names (see ?make.names for help)." – ABS2019 Jul 15 '19 at 16:28
  • Just use `as.factor` not `as.factor(as.numeric())` although Species is already a factor in the iris data set negating the need for that. I ran it without issues, didn't use your tune grid and also stopped the training as it would take a lot of time but it was going to work anyways. – NelsonGon Jul 15 '19 at 16:29
  • 1
    Yes, now it ran, but no result came out (tried both using grid and without grid) -----Something is wrong; all the Accuracy metric values are missing: logLoss AUC prAUC Accuracy Kappa Mean_F1 Mean_Sensitivity Mean_Specificity Min. : NA Min. :0.5 Min. : NA Min. : NA Min. : NA Min. : NA Min.... All NA – ABS2019 Jul 15 '19 at 16:38
  • What metric is "error"? The default is logloss, error is not defined. Objective and error_metric are not available as grid parameters for xgbTree. – NelsonGon Jul 15 '19 at 16:45

1 Answers1

2

There are a few issues:

  1. The outcome variable should be a factor.

  2. The tune grid has parameters that are not used by caret's tune grid.

  3. Since there are three levels, using a two class summary would be inappropriate. A multiclass summary is used with summaryFunction = multiClassSummary.

A working example:

data(iris)
library(caret)
library(dplyr)
library(xgboost)
    set.seed(123)
index <- createDataPartition(iris$Species, p=0.8, list = FALSE)
trainData <- iris[index,]
testData <- iris[-index,]


x_train = xgb.DMatrix(as.matrix(trainData %>% select(-Species)))
y_train = as.factor(trainData$Species)



#### Generic control parametrs
ctrl <- trainControl(method="repeatedcv", 
                     number=10, 
                     repeats=5,
                     savePredictions=TRUE, 
                     classProbs=TRUE,
                     summaryFunction = multiClassSummary)

xgbgrid <- expand.grid(nrounds = 10,
                       max_depth = 5,
                       eta = 0.05,
                       gamma = 0.01,
                       colsample_bytree = 0.75,
                       min_child_weight = 0,
                       subsample = 0.5)


set.seed(123)
x_train 
xgb_model = train(x_train, 
                  y_train,  
                  trControl = ctrl,
                    method = "xgbTree",
                  tuneGrid = xgbgrid)
xgb_model
NelsonGon
  • 13,015
  • 7
  • 27
  • 57