0

Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gradient Descent?, both methods involve computing a derivative, and then moving towards a minimum. In case of simple gradient-descent method, we calculate only the first order derivative; in Newton's method, we calculate the second order derivative as well as hessian, and apply to the vector. Moreover, the update of the vector in Newton/s method may not always be in the direction of the (-ive) gradient.

Moreover, for a given function f(x), both methods attempt to find a minimum that satisfies f'(x)=0; in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x) = 0. Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0.

Based on above arguments, would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods? The discussion cited above also falls short answering this question.

Sal
  • 35
  • 8
  • I'm voting to close this question as off-topic because it's not about programming. The question might be on-topic in the [Mathematics](https://math.stackexchange.com/) Stack Exchange site. – ottomeister Jan 19 '20 at 00:32
  • I agree it is not directly related to programming, but then, it is; it addresses the very basic approach towards programming a possible solution. I would request you to please reconsider. – Sal Jan 19 '20 at 02:21

2 Answers2

1

in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x)=0

That is not the case, both objectives are f'(x)=0. With gradient descent, just as with Newton's method, you don't have any information on whether the minima you have reached is global or local, so argmin f(x) holds only for a very small neighborhood.

Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0

Again, that is incorrect. Both are trying to minimize a cost function f(x), and there exists no guarantees whatsoever that the minimum value for f(x) would be zero. It could be any arbitrary value, so choosing f(x)=0 as the stopping criterion would just plainly be wrong. A good criterion to stop both methods is to look at how much f(x) is changing during a few consecutive iterations. If it doesn't change for a few, then you might conclude that you have reached a plateau and stop. As alternatives, you can use a criterion such as the gradient's absolute value, or if you have time constraints, you can just use a fixed number of iterations.

would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods

By definition, a gradient method looks in the direction of the gradient. Newton's method, as you know, uses local curvature to define the path towards a local optimum, and might not follow the same direction as the gradient at all, so it just wouldn't make sense to call it gradient-based.

Ash
  • 4,611
  • 6
  • 27
  • 41
  • 1
    I think this makes sense; I had this understanding from all the literature I could find on the topic. And the article "G. Venter. Review of Optimization Techniques. Encyclopedia of Aerospace Engineering, 2010" specifically classifies Newton's Method as a gradient-descent method. Any comments? – Sal Jan 19 '20 at 02:20
  • @Sal My apologies, I was actually thinking about the `Gauss-Newton` method when I wrote this... I think that my answer holds for `Newton`'s method too though. Regarding the classification of the method as gradient-based, I think that is more or less a matter of choice... In `Newton`'s the gradient does appear explicitly in the update, and it reduces to gradient descent in dimensions when the second derivatives are constant, so I think that it would make sense to classify it as a gradient method, although it seems like a stretch. – Ash Jan 20 '20 at 08:35
  • When the derivative is computed and appears explicitly in the update, why would it seem a stretch. I would say Newton/s method does classify to be a gradient-based method. The way it is updated may be different, but it does work on the same underlying concept. – Sal Jan 21 '20 at 22:03
0

would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods?

I think that's definitely fair to say. For the simple 1-d case, I like to think of Newton's Method as a gradient descent with i) step size (alpha in canonical gradient descent) equal to 1 and ii) an adjustment such that (holding the first derivative constant) the update is larger the smaller the curvature (i.e. the second derivative) of the function is at the current guess.

Matsaulait
  • 71
  • 2