I would like to use R's mlr3* packages to build ML algos in a reproducible manner. I have tried to use regr.glmboost learner with mbo tuner and run_time terminator. I have played around with the HPO part but I have not been able to make it reproducible with higher runtimes. Where did I go wrong?
Here is reprex about the phenomenon:
library(mlr3verse)
library(mlr3mbo)
library(mlr3misc)
library(magrittr)
library(nycflights13)
dt <- as.data.table(weather)
dt <- dt[order(time_hour), .(origin = as.factor(origin), month = as.factor(month), hour = as.factor(hour), temp, dewp, humid, wind_dir, wind_speed, precip, visib, pressure, time_hour = as.numeric(time_hour))]
dt <- na.omit(dt)
best_ones <- map_dtr(
1L:3L,
function(i) {
my_learner <- lrn("regr.glmboost",
family = to_tune(p_fct(levels = c("Gaussian", "Laplace", "Huber"))),
nuirange = to_tune(p_dbl(lower = 0, upper = 1000, logscale = FALSE)),
mstop = to_tune(p_int(lower = 1, upper = 3, trafo = function(x) 10**x)),
nu = to_tune(p_dbl(lower = 0.01, upper = 0.3, logscale = TRUE)),
risk = to_tune(p_fct(levels = c("inbag", "oobag", "none"))),
trace = to_tune(c(TRUE, FALSE)),
stopintern = to_tune(c(TRUE, FALSE))
)
my_task <- as_task_regr(
x = dt,
target = "pressure",
id = "weather_data"
)
my_instance <- ti(
task = my_task,
learner = my_learner,
resampling = rsmp("cv", folds = 3),
measure = msr("regr.mae"),
terminator = trm("run_time", secs = 300)
)
my_tuner <- tnr("mbo")
set.seed(1234L, kind = "L'Ecuyer-CMRG")
my_tuner$optimize(my_instance)
my_instance$archive$best()
}
)
best_ones[]
These are the somewhat diverse hyperparameters what I have got:
family | nuirange | mstop | nu | risk | trace | stopintern | regr.mae | warnings | errors | runtime_learners | uhash | timestamp | batch_nr | acq_ei | .already_evaluated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Huber | 841.3256 | 3 | -2.794395 | inbag | FALSE | FALSE | 5.090834 | 0 | 0 | 9.656 | 01cf38ab-3dc6-4490-b36e-1c14325e42ad | 2023-01-10 17:08:15 | 26 | 0.0010821 | FALSE |
Huber | 849.4117 | 3 | -2.774291 | oobag | FALSE | FALSE | 5.094204 | 0 | 0 | 9.646 | 6579c965-9184-4fe3-8e01-c1b10df21782 | 2023-01-10 17:11:56 | 18 | 0.0021940 | FALSE |
Huber | 855.7414 | 3 | -2.878846 | oobag | FALSE | FALSE | 5.096876 | 0 | 0 | 9.497 | 458122cc-f51c-4d81-a6d2-93dc024baa58 | 2023-01-10 17:16:22 | 15 | 0.0090615 | FALSE |
I guess the issue is around seeding, but I do not know how to make it the proper way in this case. Any help would be appreciated!