1

I am using predict.xgb.Booster with "multi:softprob" objective and 3 classes, in this way:

library(xgboost)
data(iris)

iris$Species <- as.factor(iris$Species)

# extract 80% random samples as training set
ix <- sample(nrow(iris), 0.8 * nrow(iris))
# all
all <- xgb.DMatrix(data.matrix(iris[, 1:ncol(iris)-1]),
                   label = as.numeric(iris$Species)-1)
# training set
train <- xgb.DMatrix(data.matrix(iris[ix, 1:ncol(iris)-1]),
                     label = as.numeric(iris$Species[ix])-1)
# test set (20% of the dataset)
test <- xgb.DMatrix(data.matrix(iris[-ix, 1:ncol(iris)-1]),
                    label = as.numeric(iris$Species[-ix])-1)

params <- list(
  objective = "multi:softprob",
  learning_rate = 0.05,
  subsample = 0.9,
  colsample_bynode = 1,
  reg_lambda = 2,
  max_depth = 35,
  num_class = length(unique(iris$Species))
)

# https://www.rdocumentation.org/packages/xgboost/versions/1.4.1.1/topics/xgb.train
mod <- xgb.train(
  params,
  data = train,
  watchlist = list(valid = test),
  early_stopping_rounds = 50,
  print_every_n = 100,
  nrounds = 10000 # early stopping
)

pred <- predict(mod, newdata = all, reshape = TRUE)

With the above code, pred looks like:

            V1         V2         V3
1   0.95375967 0.02518489 0.02105547
2   0.95375967 0.02518489 0.02105547
3   0.95375967 0.02518489 0.02105547
...

I would need to create a vector storing the classes with the highest value in each row.

For instance, in the above example it would be V1 for all the three rows.

My doubt is, how do I know to which of my 3 classes do V1 to V3 refer to?

umbe1987
  • 2,894
  • 6
  • 35
  • 63
  • 1
    It would be great if you could provide a reproducible example with an inbuilt data set like iris. – missuse Jun 29 '21 at 14:47
  • ok I will work to provide one – umbe1987 Jun 29 '21 at 14:50
  • 1
    your second question is a duplicate: https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value – missuse Jun 29 '21 at 14:50
  • Indeed, I also saw this one https://stackoverflow.com/a/59976894/1979665 I guess the point now is to understand 1. I am going to add a reproducible example with iris dataset. – umbe1987 Jun 29 '21 at 14:56
  • 1
    If I had to bet on an answer Id bet on one that suggested the order of the columns of the prediction output corresponds to the order of the levels of the response variable. – missuse Jun 29 '21 at 15:00
  • @missuse I modified the question to provide a reproducible example and to remove the duplicated part. And your last comment is probably what I needed. – umbe1987 Jun 29 '21 at 15:27
  • In the above example my comment looks true, first column corresponds to `setosa` (first level), second column to `versicolor` (second level) and third column to `virginica` (third level). In many cases just making a reproducible example solves the question. – missuse Jun 29 '21 at 15:43
  • 1
    If you would like to answer the question, I would be happy to accept it. Thanks for you time and help by the way ;) – umbe1987 Jun 30 '21 at 07:07

0 Answers0