3

I am trying to model a good example of bartMachine usage in Caret, and I can't seem to model a bartMachine with Caret correctly, can anyone tell me, what is exactly the main error means? or is there a simple reproducible code for BART Modeling?

Here is the snippets I use to model a bartMachine using some dummy data of HouseVotes84 and cars dataset:

library(mlbench)
library(caret)

data("HouseVotes84")
#Using HouseVotes84 as Classification Task Dataset and mtcars as Regression Task Dataset
dummy_data_classif <- HouseVotes84[,2:length(colnames(HouseVotes84))] %>% 
  mutate_if(is.factor, as.numeric)
dummy_data_classif <- data.frame(cbind(Class=HouseVotes84[,1], dummy_data_classif))
dummy_data_classif[is.na(dummy_data_classif)] <- 0

data("cars")
dummy_data_regr <- cars

caret_method_tester <- function(dummy_data, formula, resample_plan=1, 
                                test_method, time_limit=30, 
                                grid_param=c(), parallel_mode=FALSE){
  library(caret)
  library(R.utils)
  formula <- as.formula(formula)
  resampling <- NULL
  if(resample_plan==1){
    resampling <- trainControl(method = "repeatedcv",
                               number = 10,
                               repeats = 5,
                               allowParallel = parallel_mode) 
  }
  else if(resample_plan==2){
    resampling <- trainControl(method = "cv",
                               number = 5,
                               allowParallel = parallel_mode) 
  }
  else if(resample_plan==3){
    resampling <- trainControl(method = "adaptive_cv",
                               number = 10, repeats = 5,
                               allowParallel = parallel_mode,
                               adaptive = list(min = 3, alpha = 0.05, 
                                               method = "BT", complete = FALSE))
  }
  else if(resample_plan==4){
    resampling <- trainControl(method = "boot",
                               number = 5,
                               allowParallel = parallel_mode)
  }
  else if(resample_plan==5){
    resampling <- trainControl(method = "boot_all",
                               number = 5,
                               allowParallel = parallel_mode)
  }
  tryCatch(
    expr={
      if(length(grid_param) > 0){
        withTimeout(
          model <- caret::train(formula, 
                       data = dummy_data, 
                       method = test_method, 
                       trControl = resampling,
                       tuneGrid=grid_param), timeout = 300
        )
      }
      else{
        withTimeout(
          model <- caret::train(formula, 
                                data = dummy_data, 
                                method = test_method, 
                                trControl = resampling), timeout=300   
        )
        
      }
      return(model)
    },
    error=function(cond){
      message("Test Model Failed")
      message("Here's the original error message:")
      message(cond)
      return(NULL)
    },
    warning=function(cond){
      message("Warning Triggered!")
      message("Here's the original warning message:")
      message(cond)
      return(model)
    }
  )
}

bart_reg <- caret_method_tester(dummy_data_regr, "Price ~ .", 
                test_method="bartMachine", time_limit=30, resample_plan=2)

Test Model Failed
Here's the original error message:
argument is of length zero

bart_classif <- caret_method_tester(dummy_data_classif, "Class ~ .", 
                test_method="bartMachine", time_limit=30, resample_plan=2)

Test Model Failed
Here's the original error message:
incorrect number of dimensions

I used try Catch method to easily notify things about the code progress, so it is clear when the code fails, issues warning, or is successful.

the dataset also doesn't have any NA Values as far as I am concerned

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Jovan
  • 763
  • 7
  • 26

1 Answers1

5

Would be much better if you reduce the code to the essential part, basically the train function with bartMachine doesn't work. We can illustrate that with this example and we get the same error message:

mdl = train(mpg ~ .,data=mtcars,method="bartMachine",trControl=trainControl(method="cv"))
Error in if (grepl("adaptive", trControl$method) & nrow(tuneGrid) == 1) { : 
  argument is of length zero

The error is a bug with the code in caret, if you don't provide the tuning grid, the default function used to create it does not return a data.frame:

getModelInfo()$bartMachine$grid
function(x, y, len = NULL, search = "grid") {
                    if(search == "grid") {
                      out <- expand.grid(num_trees = 50,
                                         k = (1:len)+ 1,
                                         alpha = seq(.9, .99, length = len),
                                         beta = seq(1, 3, length = len),
                                         nu =  (1:len)+ 1)
                    } else {
                      out <- data.frame(num_trees = sample(10:100, replace = TRUE, size = len),
                                        k = runif(len, min = 0, max = 5),
                                        alpha = runif(len, min = .9, max = 1),
                                        beta = runif(len, min = 0, max = 4),
                                        nu = runif(len, min = 0, max = 5))
                    }
                    if(is.factor(y)) {
                      out$k <- NA
                      out$nu <- NA
                    }
                  }

You can either provide a tune grid:

mdl = train(mpg ~ .,data=mtcars,method="bartMachine",
trControl=trainControl(method="boot"),
tuneGrid=data.frame(num_trees=50,k=3,alpha=0.1,beta=0.1,nu=4))

mdl

Bayesian Additive Regression Trees 

32 samples
10 predictors

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 32, 32, 32, 32, 32, 32, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  2.826126  0.8344417  2.292464

Tuning parameter 'num_trees' was held constant at a value of 50
 'beta' was held constant at a value of 0.1
Tuning parameter 'nu' was
 held constant at a value of 4

Or you fix the function above and create a new method, you can read more here:

newBartMachine = getModelInfo()$bartMachine

newBartMachine$grid = function(x, y, len = NULL, search = "grid") {
                    if(search == "grid") {
                      out <- expand.grid(num_trees = 50,
                                         k = (1:len)+ 1,
                                         alpha = seq(.9, .99, length = len),
                                         beta = seq(1, 3, length = len),
                                         nu =  (1:len)+ 1)
                    } else {
                      out <- data.frame(num_trees = sample(10:100, replace = TRUE, size = len),
                                        k = runif(len, min = 0, max = 5),
                                        alpha = runif(len, min = .9, max = 1),
                                        beta = runif(len, min = 0, max = 4),
                                        nu = runif(len, min = 0, max = 5))
                    }
                    if(is.factor(y)) {
                      out$k <- NA
                      out$nu <- NA
                    }
                    return(out)
                  }
mdl = train(mpg ~ .,data=mtcars,method=newBartMachine,trControl=trainControl(method="cv"),tuneLength=1)

Bayesian Additive Regression Trees 

32 samples
10 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 28, 28, 28, 29, 30, 30, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  2.338429  0.9581958  2.057181

Tuning parameter 'num_trees' was held constant at a value of 50
 'beta' was held constant at a value of 1
Tuning parameter 'nu' was
 held constant at a value of 2
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • I see, so the original bartMachine code didn't return the dataframe.. really amusing, @StupidWolf thanks for showing a good example :) – Jovan Jul 20 '20 at 02:54
  • you're welcome.. hope it's helpful. bartMachine takes quite a bit of time to run.. so not commonly used i guess – StupidWolf Jul 20 '20 at 14:23