13

I came across the basin hopping algorithm in scipy and created a simple problem to understand how to use it but it doesnt seem to be working correctly for that problem. May be I'm doing something completely wrong.

Here is the code:

import scipy.optimize as spo
import numpy as np
minimizer_kwargs = {"method":"BFGS"}    
f1=lambda x: (x-4)
def mybounds(**kwargs):
    x = kwargs["x_new"]
    tmax = bool(np.all(x <= 1.0))
    tmin = bool(np.all(x >= 0.0))
    print x
    print tmin and tmax
    return tmax and tmin


def print_fun(x, f, accepted):
      print("at minima %.4f accepted %d" % (f, int(accepted)))
x0=[1.]     
spo.basinhopping(f1,x0,accept_test=mybounds,callback=print_fun,niter=200,minimizer_kwargs=minimizer_kwargs)

The solution it gives is x: array([ -1.80746874e+08])

Ali
  • 56,466
  • 29
  • 168
  • 265
akhil
  • 839
  • 3
  • 8
  • 15
  • 1
    Try a quadratic penalty for x outside the bounds: [scipy-optimize-leastsq-with-bound-constraints](http://stackoverflow.com/questions/9878558/scipy-optimize-leastsq-with-bound-constraints) – denis May 23 '14 at 08:42

3 Answers3

36

The function you are testing makes use of an approach called Metropolis-Hastings, which can be modified into a procedure called simulated annealing that can optimze functions in a stochastic way.

The way this works is as follows. First you pick a point, like your point x0. From that point, you generate a random perturbation (this is called a "proposal"). Once there is a proposed perturbation, you get your candidate for a new point by applying the perturbation to your current out. So, you could think of it like x1 = x0 + perturbation.

In regular old gradient descent, the perturbation term is just a deterministically calculated quantity, like a step in the direction of the gradient. But in Metropolis-Hastings, perturbation is generated randomly (sometimes by using the gradient as a clue about where to randomly go... but sometimes just randomly with no clues).

At this point when you've got x1, you have to ask yourself: "Did I do a good thing by randomly perturbing x0 or did I just mess everything up?" One part of that has to do with sticking inside of some bounds, such as your mybounds function. Another part of it has to do with how much better/worse the objective function's value has become at the new point.

So there are two ways to reject the proposal of x1: first, it could violate the bounds you have set and be an infeasible point by the problem's definition; second, from the accept/reject assessment step of Metropolis-Hastings, it could just be a very bad point that should be rejected. In either case, you will reject x1 and instead set x1 = x0 and pretend as if you just stayed in the same spot to try again.

Contrast that with a gradient-type method, where you'll definitely, no matter what, always make at least some kind of movement (a step in the direction of the gradient).

Whew, all right. With that all aside, let's think about how this plays out with the basinhopping function. From the documentation we can see that the typical acceptance condition is accessed by the take_step argument, and the documentation saying this: "The default step taking routine is a random displacement of the coordinates, but other step taking algorithms may be better for some systems." So, even apart from your mybounds bounds checker, the function will be making a random displacement of the coordinates to generate its new point to try. And since the gradient of this function is just the constant 1, it will always be making the same big steps in the direction of negative gradient (for minimizing).

At a practical level, this means that the proposed points for x1 are always going to be quite outside of the interval [0,1] and your bounds checker will always veto them.

When I run your code, I see this happening all the time:

In [5]: spo.basinhopping(f1,x0,accept_test=mybounds,callback=print_fun,niter=200,minimizer_kwargs=minimizer_kwargs)
at minima -180750994.1924 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.5530 accepted 0
[ -1.80746873e+08]
False
at minima -180746877.3896 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.7281 accepted 0
[ -1.80746874e+08]
False
at minima -180746878.2433 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.5774 accepted 0
[ -1.80746874e+08]
False
at minima -180746878.3173 accepted 0
[ -1.80750990e+08]
False
at minima -180750994.3509 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.6605 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.6966 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.6900 accepted 0
[ -1.80750990e+08]
False
at minima -180750993.9707 accepted 0
[ -1.80750990e+08]
False
at minima -180750994.0494 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.5824 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.5459 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.6679 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.5823 accepted 0
[ -1.80750990e+08]
False
at minima -180750993.9308 accepted 0
[ -1.80746874e+08]
False
at minima -180746878.0395 accepted 0
[ -1.80750991e+08]
False

# ... etc.

So it's never accepting the posposal points. The output is not telling you that it has found a solution. It's telling you than the random perturbing to explore possible solutions keeps resulting in points that look better and better to the optimizer, but which keep failing to satisfy your criteria. It can't find its way all the way back to [0,1] to get points that do satisfy mybounds.

ely
  • 74,674
  • 34
  • 147
  • 228
6

The behavior of basin-hopping as you've coded it is to combine perturbations with local minimization.

Your routine keeps producing unacceptable because of the local optimization piece. Essentially the BFGS routine you're using is completely unconstrained, so it follows the gradient out to negative infinity. This result then gets fed back into your checker.

So it doesn't matter where your Basin-Hopping point x1 is, the BFGS part always goes to a massive negative value.

The benchmark function x - 4 that you're using is not the ideal target here. Check out e.g. the Rastrigin function. If you actually need to optimize a linear function, there's a whole class of algorithms to do that (see Linear Programming on Wikipedia).

Yike Lu
  • 1,023
  • 11
  • 13
  • I just don't see why someone downvoted this answer. prpl.mnky.dshwshr just missed the point. As Yike Lu said, "The reason your routine keeps producing unacceptable points is related to the gradient though, but it's because of the local optimization piece, not the perturbation piece. " – zell Jan 17 '15 at 16:12
  • @zell I downvoted this answer because it makes some factually inaccurate claims. First, in my answer, I did not claim that the basinhopper follows the gradient. In fact, I highlighted in the documentation where it says that it *does not* follow the gradient. I mentioned the bit about the gradient to explain what you *could* do. For example, the No-U-Turn Sampler or Hybrid Monte Carlo are techniques that do Metropolis *and also* use the gradient to help. But I specifically said that in this case, the perturbation is just random, as per the documentation I quoted. – ely Feb 12 '15 at 15:31
  • Further, I also had already explained precisely this issue of the gradient leading to massive negative values, thus preventing the perturbations from possibly remaining in the feasible region. The Metropolis algorithm alone, without any regard to any local optimizer, would walk towards negative infinity in this case, just by virtue of how its accept/reject works. I felt this was a downvote-worthy answer because (a) it only restated exactly what was already in my answer, with less explanation, and (b) it actively falsely claimed that I did not make the explanations in my answer. – ely Feb 12 '15 at 15:32
  • 1
    Can you please clarify the factually inaccurate pieces, with regard to what basin-hopping as implemented in SciPy does? I don't mind the downvote or anything, but am curious to be corrected. As I understand it, the local minimization piece is what follows the negative gradient, not the global perturbation piece. The reason I posted originally was that I felt that distinction was unclear, although I see that I should have been more careful before saying that your post was inaccurate. – Yike Lu Feb 12 '15 at 19:08
  • The problem I have with your description is that it *does not matter what type of local minimization is used, or even if none at all is used*. Even if `scipy.optimize.basinhopping` *turned off completely all local optimization steps* the OP's original problem would still happen purely because Metropolis-Hastings accept/reject step *itself* will walk along the gradient of the function over time. That would be precisely unaided simulated annealing. If one wants, one *can* add local optimizers that do cheap, localized traditional optimization, but *you don't have to*. – ely Feb 13 '15 at 16:58
  • I understand now. Basically the accept/reject will result in gradient walking because it will only accept perturbations that improve the objective function. Actually in that case, MH likely won't go out of bounds, so I guess my answer does have some incremental value. Will edit to reflect. – Yike Lu Feb 13 '15 at 20:27
  • @YikeLu What makes you say "Actually in that case, MH likely won't go out of bounds..."? The *opposite* is true. MH would be *very likely* to go out of bounds in this case. In these types on constrained optimization problems, MH is usually only applicable if it is very straightforward to *draw samples from* some distribution that is proven to automatically satisfy the bounds *and also* has the necessary gradient for optimization. It's extremely hard to come up with these in practice (an example is sampling from matrices that have fixed row or column sums as constraints ... *very* hard). – ely Feb 21 '15 at 21:14
  • In other words, when you say in your answer, "Essentially the BFGS routine you're using is completely unconstrained, so it follows the gradient out to negative infinity. This result then gets fed back into your checker." this is just wrong. It *would not matter* if we just turned off the local BFGS optimizer completely, the same problem of going out of bounds (and thus MH never accepting any proposal point) would *still* happen. You can only solve the problem by directly generating proposals that satisfy the bounds implicitly, which would thereby automatically solve the local optimizer problem – ely Feb 21 '15 at 21:18
  • 1
    OP has a custom accept/reject test. If he simply randomly picked points with no local optimization, his optimizer would stay in bounds. Also, if you did the logical AND between the MH test and OP's bounds test, the same thing would occur and there'd be convergence, as the custom piece of the optimizer would reject the point. – Yike Lu Feb 23 '15 at 21:36
  • In the case with the local optimizer turned off, why would random points stay in bounds? There is no reason to think that random proposal points would stay in bounds. Unless you define your random sampling procedure to natively draw only from the region of admissible points, it's totally possible that you'll just generate proposals outside of your bounds indefinitely. Your chain of samples would then just be one long chain of repetitions of the same singleton value, whatever the starting value happened to be. If the proposal distribution is poor enough, this could go on for a long time. – ely Apr 01 '15 at 20:26
  • I wasn't commenting on the generic algorithm, but specifically what's implemented. It's an `x0 + delta` perturbation with random (normal?) `delta` as I understand it. It's possible that you would still wander out of bounds indefinitely, but very very unlikely. – Yike Lu Apr 06 '15 at 14:51
  • I agree that the local optimizer is taking points and running with them which always leads to something the bounds will reject. I'm just saying even if this wasn't true, the underlying idea of basically rejection sampling an otherwise-Metropolis algorithm with some bounds is the bigger conceptual problem. As one commenter wrote, you could instead add a constraint to penalize proposals outside of the range, which gives up guaranteed rejection of out-of-bounds points, but also allows probability that the algorithm will favor being in the bounds in the end. – ely Apr 06 '15 at 18:24
  • 1
    Here's the mechanism I see... optimizer picks `x0 + delta`. If it's not in bounds or it doesn't improve the objective, it rejects it. If it improves and stays in bounds, that becomes the new `x0`. So even with a uniform distribution, over a large number of draws it becomes extremely unlikely that every single `delta` will result in out of bounds. Can we agree on that? Kind of hard to parse the differences this deep into the comment thread. – Yike Lu Apr 08 '15 at 18:15
  • 1
    No that's not right. The algorithm picks `x0 + delta` and then, whether in bounds or not, executes the local deterministic optimizer from that point (which runs away to minus infinity in this case). After it hits the maximum iteration limit (or in other cases, when it finds an optimum), *then* it checks if it's in bounds and whether it improves the fitness function or not, and does Metropolis accept/reject (you can see it [here](https://github.com/scipy/scipy/blob/master/scipy/optimize/_basinhopping.py#L92)). – ely Apr 08 '15 at 19:01
  • 1
    If that whole thing doesn't accept, then the adaptive stepper adjusts downward by a factor of 0.9, and you'll uniformly draw a sample from [0.55, 1.45] (from the original point again, having still a 0.5 chance of not being in the interval). I do agree that this alone, with enough repeated draws, makes the probability infinitesimal that no such draw was inside of [0,1]. It is curious in reading the code how it is even possible that it outputs the answer the OP gives, rather than always the initial point. – ely Apr 08 '15 at 19:12
  • All right, I think we finally have hashed this out as fully as we can. I agree with what you said. – Yike Lu Apr 09 '15 at 16:02
2

Yike Lu has already pointed the problem out: Your bounds are only enforced at the top level but the local optimizer BFGS knows knows nothing about them.

In general, it is often a bad strategy to use "hard" bounds on an optimisation because with most algorithms no path which may lead the algorithm to an optimum directly on border of your permitted space will be allowed to touch the border, ever, or it will be terminated. You can see how it is hard to find the optimum in your case above (x=0) without ever trying x=-0.0000001, finding that you went a tad too far, and walking back a little? Now, there are algorithms which can do this by transforming the input data (in scipy.optimize, those are the ones which accept bounds as an argument), but the general solution is this:

You update your cost function to increase very quickly in case the input moves out of the permitted scope:

def f1(x):
    cost_raw = (x-4)
    if   x >= 1.0: cost_overrun = (1000*(x-1))**8
    elif x <= 0.0: cost_overrun = (1000*(-x))**8
    else: cost_overrun = 0.0

    return(cost_raw + cost_overrun)

That way, any optimizer will see the cost function increase and as soon as it oversteps the bounds, will head right back into the permitted space. This is not a strict enforcement, but optimizers are iteratively approximating anyway, so depending on how strict you need it, you can adapt the penalty function to make the increase more or less fuzzy. Some optimizers will prefer to have a continuous derivative (hence the power function), some will be happy to deal with a fixed step -- in that case you could simply add 10000 whenever you're out of bounds.

Zak
  • 3,063
  • 3
  • 23
  • 30