4

I'm trying to use Optim in Julia to solve a two variable minimization problem, similar to the following

x = [1.0, 2.0, 3.0]
y = 1.0 .+ 2.0 .* x .+ [-0.3, 0.3, -0.1]

function sqerror(betas, X, Y)
    err = 0.0
    for i in 1:length(X)
        pred_i = betas[1] + betas[2] * X[i]
        err += (Y[i] - pred_i)^2
    end
    return err
end

res = optimize(b -> sqerror(b, x, y), [0.0,0.0])
res.minimizer

I do not quite understand what [0.0,0.0] means. By looking at the document http://julianlsolvers.github.io/Optim.jl/v0.9.3/user/minimization/. My understanding is that it is the initial condition. However, if I change that to [0.0,0., 0.0], the algorithm still work despite the fact that I only have two unknowns, and the algorithm gives me three instead of two minimizer. I was wondering if anyone knows what[0.0,0.0] really stands for.

jgr
  • 125
  • 4
  • Note, that you are not looking at the latest documentation. I'm not sure how you landed on v0.9.3. The latest docs are here https://julianlsolvers.github.io/Optim.jl/stable/ – pkofod May 30 '23 at 07:38

1 Answers1

2

It is initial value. optimize by itself cannot know how many values your sqerror function takes. You specify it by passing this initial value.

For example if you add dimensionality check to sqerror you will get a proper error:

julia> function sqerror(betas::AbstractVector, X::AbstractVector, Y::AbstractVector)
           @assert length(betas) == 2
           err = 0.0
           for i in eachindex(X, Y)
               pred_i = betas[1] + betas[2] * X[i]
               err += (Y[i] - pred_i)^2
           end
           return err
       end
sqerror (generic function with 2 methods)

julia> optimize(b -> sqerror(b, x, y), [0.0,0.0,0.0])
ERROR: AssertionError: length(betas) == 2

Note that I also changed the loop condition to eachindex(X, Y) to ensure that your function checks if X and Y vectors have aligned indices.

Finally if you want performance and reduce compilation cost (so e.g. assuming you do this optimization many times) it would be better to define your optimized function like this:

objective_factory(x, y) = b -> sqerror(b, x, y)
optimize(objective_factory(x, y), [0.0,0.0])
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Thank you, this is very clear! But in my case, what is exactly the third output ``optim`` gives me if I use ``[0.0,0., 0.0]`` as the initial value? I assume output[1] should be betas[1], output[2] should be betas[2] ? – jgr Dec 11 '22 at 18:26
  • third output is garbage. Optimizer finds that it has no influence on the objective function so it decides that any value is equally good. Since optimizer does not care about values that do not influence the objective you get some arbitrary value. – Bogumił Kamiński Dec 11 '22 at 18:42
  • This makes sense. But when I tried ``[0.,0.]``, the result is ``0.7666453239907458, 2.1000018115207224``, for ``[0.,0.,0.]``, it is ``0.7667381494079513 2.0999613355991067 -0.5879115555905301``. I'm sorry for bothering you so many times, and I really appreciate your patience. – jgr Dec 12 '22 at 00:48
  • Note that the solutions are very close - they both approximate the true unknown solution. `optim` produces an approximate solution to your optimization problem. If you change its domain (from 2 to 3 dimensional) then: 1) the optimization process is changed slightly, 2) the stopping condition for the algorithm changes. Thus you do not get identical results on first two dimensions. – Bogumił Kamiński Dec 12 '22 at 07:29