How to conduct catboost grid search using GPU in R?

Question

I'm setting up a grid search using the catboost package in R. Following the catboost documentation (https://catboost.ai/docs/), the grid search for hyperparameter tuning can be conducted using the 3 separate commands in R,

fit_control <- trainControl(method = "cv", number = 4, classProbs = TRUE)
grid <- expand.grid(depth = c(7,8,9,10), learning_rate = c(0.1,0.2,0.3,0.4), iterations = c(10,100,1000))
report <- train(df.scale, as.factor(make.names(as.matrix(tier1))), method = catboost.caret, logging_level = 'Verbose', preProc = NULL, tuneGrid = grid, trControl = fit_control)

searching across different values for depth, learning rate, and the number of iterations. These commands seem well enough, it's just I can't figure out where to input the option for the task_type = "GPU". Would appreciate any help on how to specify using the GPU for finding the optimal parameters using R.

score 3 · Answer 1 · answered Aug 20 '21 at 10:24

It can be done the following way:

fit_control <- trainControl(method = "cv", number = 4, classProbs = TRUE)
grid <- expand.grid(depth = c(7,8,9,10), learning_rate = c(0.1,0.2,0.3,0.4), iterations = c(10,100,1000))
report <- train(df.scale, as.factor(make.names(as.matrix(tier1))), method = catboost.caret, logging_level = 'Verbose', preProc = NULL, tuneGrid = grid, trControl = fit_control,
task_type = "GPU")

This works due to ellipsis mechanics. All arguments that are unknown to caret.train itself are eventually passed to catboost.caret$fit and taken as training parameters for catboost. The exact place in catboost code where it happens is here:

...
catboost.caret$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
  param <- c(param, list(...)) # all ellipsis args are taken to param
  if (is.null(param$loss_function)) {
...

If you pass an unknown parameter this way, catboost will trigger an error:

report <- train(x, as.factor(make.names(y)),
            method = catboost.caret,
            logging_level = 'Verbose', preProc = NULL,
            tuneGrid = grid, trControl = fit_control, what_is_this = "GPU") 
> warnings()
Warning messages:
1: model fit failed for Fold1: depth=4, learning_rate=0.1, l2_leaf_reg=0.001, rsm=1, border_count=64, iterations=100 Error in catboost.train(pool, test_pool, param) : 
  catboost/private/libs/options/plain_options_helper.cpp:501: Unknown option {what_is_this} with value "GPU"

Nikhil Gupta · Answer 2 · 2020-03-31T04:06:33.353

It looks like you are using the caret package to perform the training. In this case, it looks like the caret package does not pass any additional arguments to the catboost.train function so it may not support the GPU functionality. You can see from the code in caret for this method that the ... argument is not passed to the catboost.train function.

#' Fit model based on input data
#'
#' @param x, y: the current data used to fit the model
#' @param wts: optional instance weights (not applicable for this particular model)
#' @param param: the current tuning parameter values
#' @param lev: the class levels of the outcome (or NULL in regression)
#' @param last: a logical for whether the current fit is the final fit
#' @param weights: weights
#' @param classProbs: a logical for whether class probabilities should be computed
#'
#' @noRd
catboost.caret$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
    param <- c(param, list(...))
    if (is.null(param$loss_function)) {
        param$loss_function <- "RMSE"
        if (is.factor(y)) {
            param$loss_function <- "Logloss"
            if (length(lev) > 2) {
                param$loss_function <- "MultiClass"
            }
            y <- as.double(y) - 1
        }
    }
    test_pool <- NULL
    if (!is.null(param$test_pool)) {
        test_pool <- param$test_pool
        if (class(test_pool) != "catboost.Pool")
            stop("Expected catboost.Pool, got: ", class(test_pool))
        param <- within(param, rm(test_pool))
    }
    pool <- catboost.from_data_frame(x, y, weight = wts)
    model <- catboost.train(pool, test_pool, param)
    model$lev <- lev
    return(model)
}

Depending on your level of proficiency in R and caret, you can add your own model to caret by basically copying the model in the caret github location and modify it to accept the GPU argument which should go into the parameter list per their documentation

How to conduct catboost grid search using GPU in R?

2 Answers2