2

I'd like to train 3 models in MLJ.jl: ARDRegressor, AdaBoostRegressor, BaggingRegressor

Currently, I train them 1 at a time for example:

using Pkg; Pkg.activate("."); Pkg.instantiate();
using RDatasets, MLJ, Statistics, PrettyPrinting, GLM   
X, y =  @load_boston; train, test = 1:406, 407:506 

@load ARDRegressor
reg = ARDRegressor
m = machine(reg(), X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
os_ARDRegressor = rms(ŷ , y[test])

I'd like to train them w/ a loop such as:

modlist = [ARDRegressor; AdaBoostRegressor; BaggingRegressor]  
score = []

for (i, mod) in enumerate(modlist)
@load mod;
reg = mod;
m = machine(reg(), X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
push!( score,  (i, mod, rms(ŷ , y[test]))  )
end

1 Answers1

2

There are a few issues in running your last code block.

  • The loop for jj in eachindex(Models) iterates over the indices of the Models array, so jj takes the values 1, 2, 3. Rather loop over the Models array directly.

  • @load ARDRegressor is a macro invocation; this means @load jj will translate to @load("jj"), so doesn't use jj like the variable you intended.

  • The value of os_jj will be overwritten on every iteration of the loop. You rather want to keep the score in an array at that index: os[jj] = ...
  • MLJ requires you to import the packages that contain the models before loading them. Remind yourself that using ScikitLearn requires the sklearn python package to be installed in your current environment.

Consider the following working code example:

using MLJ, RDatasets
X, y =  @load_boston; train, test = 1:406, 407:506 
models = [@load ARDRegressor; @load AdaBoostRegressor; @load BaggingRegressor]
score = Array{Float64}(undef, 3)
for (i, model) in enumerate(models)
    m = machine(model, X, y)
    fit!(m, rows=train);
    ŷ = predict(m, rows=test)
    score[i] = rms(ŷ, y[test])
end
@show score

Side note: the use of using Pkg; Pkg.activate(".") is unnecessary when running julia using this command: julia --project. But this comes down to personal preference

  • I'm trying to do something similar but with my own code. How should one do it when each machine takes a different number of parameters? For instance, I have kernel regression a model that takes a sampling mask with the positions of the samples. Here's the question I asked about this https://stackoverflow.com/questions/59473668/iterating-over-different-functions-with-different-number-of-parameters-in-julia?noredirect=1#comment105126050_59473668 I'd appreciate any help – Pedro G. Dec 25 '19 at 00:32