0

I encounter a strange problem when trying to train a model in R using caret :

> bart <- train(x = cor_data, y = factor(outcome), method = "bartMachine")
Error in tuneGrid[!duplicated(tuneGrid), , drop = FALSE] : 
 nombre de dimensions incorrect 

However, when using rf, xgbTree, glmnet, or svmRadial instead of bartMachine, no error is raised. Moreover, dim(cor_data) and length(outcome) return [1] 3056 134 and [1] 3056 respectively, which indicates that there is indeed no issue with the dimensions of my dataset.

I have tried changing the tuneGrid parameter in train, which resolved the problem but caused this issue instead :

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-89-thread-1"

My dataset includes no NA, and all variables are either numerical or binary.

My goal is to extract the most important variables in the bart model. For example, I use for random forests:

rf <- train(x = cor_data, y = factor(outcome), method = "rf")
rfImp <- varImp(rf)
rf_select <- row.names(rfImp$importance[order(- rfImp$importance$Overall)[1:43], , drop = FALSE])

Thank you in advance for your help.

Ignis Oculo
  • 71
  • 1
  • 5
  • Ok.. so the issue is with java, I suggest updating your question since you solved the issue with tuneGrid, check this https://stackoverflow.com/questions/34624002/r-error-java-lang-outofmemoryerror-java-heap-space – StupidWolf Jul 23 '20 at 17:00

1 Answers1

0

Since your goal is to extract the most important variables in the bart model, I will assume you are willing to bypass the caret wrapper and do it directly in R bartMachine, which is the only way I could successfully run it.

For my system, solving the memory issue required 2 further things:

  1. Restart R and before loading anything, allocate 8Gb memory as so:
options(java.parameters = "-Xmx8g")
  1. When running bartMachineCV, turn off mem_cache_for_speed:
library(bartMachine)
set_bart_machine_num_cores(16)
bart <- bartMachineCV(X = cor_data, y = factor(outcome), mem_cache_for_speed = F)

This will iterate through 3 values of k (2, 3 and 5) and 2 values of m (50 and 200) running 5 cross-validations each time, then builds a bartMachine using the best hyperparameter combination. You may also have to reduce the number of cores depending on your system, but this took about an hour on a 20,000 observation x 12 variable training set on 16 cores. You could also reduce the number of hyperparameter combinations it tests using the k_cvs and num_tree_cvs arguments.

Then to get the variable importance:

 vi <- investigate_var_importance(bart, num_replicates_for_avg = 20)
print(vi)

You can also use it as a predictive model with predict(bart, new_data=new) similar to the object normally returned by caret::train(). This worked on R4.0.5, bartMachine_1.2.6 and rJava_1.0-4

Stu2
  • 1
  • 2