0

I'm trying to use mlrMBO to tune hyperparameters while doing parallel computation. I was unfamiliar with parallel computation prior, but I've read that it helps increase computation speed. However, when I run the following code, the program just hangs. When I do not run the code in parallel, only the first iteration runs. I'm not sure what I am doing wrong, and I couldn't find a solution anywhere. Any help would be great.

data("PimaIndiansDiabetes2", package = "mlbench")
set.seed(123)
training.samples <- PimaIndiansDiabetes2$diabetes %>% 
  createDataPartition(p = 0.8, list = FALSE)
train.data  <- PimaIndiansDiabetes2[training.samples, ]
test.data <- PimaIndiansDiabetes2[-training.samples, ]

lrn <- makeLearner("classif.xgboost", eval_metric = "auc", predict.type = "prob")
lrn$par.set <- c(lrn$par.set,
                 makeParamSet(
                   makeNumericLearnerParam("scale_pos_weight")))

ps <- makeParamSet(
  makeNumericParam("eta",              lower = 0.001, upper = 0.5),
  makeNumericParam("gamma",            lower = 0,   upper = 5),
  makeIntegerParam("max_depth",        lower = 3,   upper = 6),
  makeIntegerParam("min_child_weight", lower = 1,   upper = 10),
  makeNumericParam("subsample",        lower = 0.6, upper = 0.8),
  makeNumericParam("colsample_bytree", lower = 0.5, upper = 0.7),
  makeIntegerParam("nrounds",          lower = 25,  upper = 2000),
  makeIntegerParam("scale_pos_weight", lower = 30, upper = 100),
  makeNumericParam("lambda", lower = -1, upper = 0, trafo = function(x) 10^x),
  makeIntegerParam("max_delta_step", lower = 1, upper = 10)
)

task <- makeClassifTask(data = train.data, target = "diabetes", positive = "pos")

mbo.ctrl <- makeMBOControl()
mbo.ctrl <- setMBOControlTermination(mbo.ctrl, iters = 10)

surrogate.lrn <- makeLearner("regr.km", predict.type = "se")

design.mat <- generateRandomDesign(n = 10, par.set = ps)
ctrl <- mlr:::makeTuneControlMBO(learner = surrogate.lrn, mbo.control = mbo.ctrl, mbo.design = design.mat)


# Tuning hyperparameters
parallelStartMulticore(cpus = 24L)
res.mbo <- tuneParams(learner = lrn, 
                     task = task, 
                     resampling = cv10, 
                     par.set = ps, 
                     control = ctrl, 
                     show.info = TRUE, 
                     measures = auc)
parallelStop()

This is what is shown in the console:

> res.mbo <- tuneParams(learner = lrn, 
+                      task = task, 
+                      resampling = cv10, 
+                      par.set = ps, 
+                      control = ctrl, 
+                      show.info = TRUE, 
+                      measures = auc)
[Tune] Started tuning learner classif.xgboost for parameter set:
                    Type len Def       Constr Req Tunable Trafo
eta              numeric   -   - 0.001 to 0.5   -    TRUE     -
gamma            numeric   -   -       0 to 5   -    TRUE     -
max_depth        integer   -   -       3 to 6   -    TRUE     -
min_child_weight integer   -   -      1 to 10   -    TRUE     -
subsample        numeric   -   -   0.6 to 0.8   -    TRUE     -
colsample_bytree numeric   -   -   0.5 to 0.7   -    TRUE     -
nrounds          integer   -   -  25 to 2e+03   -    TRUE     -
scale_pos_weight integer   -   -    30 to 100   -    TRUE     -
lambda           numeric   -   -      -1 to 0   -    TRUE     Y
max_delta_step   integer   -   -      1 to 10   -    TRUE     -
With control class: TuneControlMBO
Imputation value: -0
Mapping in parallel: mode = multicore; level = mlrMBO.feval; cpus = 24; elements = 10.
user122514
  • 397
  • 5
  • 13
  • Please do not cross-post questions across sites! Writing this here because I deleted the same question in two repos on Github... – pat-s Dec 19 '19 at 22:00
  • Please provide a _minimal_ example using _formatted_ code. Have a look [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for guidance. On the first look, the Q looks like you've tried a few minutes and then got tired (both in content and formatting). Happy to answer once there is a minimal, well formatted reprex. (and yes, I clearly dislike cross-posting AND messy code in questions). – pat-s Dec 19 '19 at 22:08
  • 1
    My code is formatted according to the guide you sent me. In what way is my code "messy"? I provided a reproducible dataset that is available through a package and did not even use my own dataset. I don't even know what you're critiquing when there's nothing to nitpick about my code and how I've formatted it on this page. And no, I did not wait a few minutes. It's been multiple hours. – user122514 Dec 19 '19 at 22:29
  • what happens if you change the parellelisation code to `parallelStartMulticore(cpus = 10L, level = "mlr.resample")`? – missuse Dec 20 '19 at 13:18

0 Answers0