Having trouble making update in gradient descent implementation?

Question

Hi I am working on implementing gradient descent with backtracking line search. However when I try to update f(x0) the value doesn't change. Could it be something going on with the lambda expression, I am not too familiar with them?

import numpy as np
import math
alpha = 0.1
beta = 0.6

f = lambda x: math.exp(x[0] + 3*x[1] - 0.1) + math.exp(x[0] - 3*x[1] -0.1) + math.exp(-1*x[0] - 0.1)
dfx1 = lambda x: math.exp(x[0] + 3*x[1] - 0.1) + math.exp(x[0] - 3*x[1] -0.1) - math.exp(-x[0] - 0.1)
dfx2 = lambda x: 3*math.exp(x[0] + 3*x[1] - 0.1) - 3*math.exp(x[0] - 3*x[1] -0.1) 

t = 1
count = 1
x0 = np.array([1.0,1.0])
dx0 = np.array([1e-3, 1e-3])

x = []
d = np.array([-1*dfx1(x0),-1*dfx2(x0)]);

grad = np.array([1*dfx1(x0),1*dfx2(x0)])
def backtrack(x0, dfx1, dfx2, t, alpha, beta, count):
    while (f(x0 + t*d) > f(x0) + alpha*t*np.dot(d,grad) or count < 50 ):
        d[0] = -1*dfx1(x0);
        d[1] = -1*dfx2(x0);
        grad[0] = dfx1(x0);
        grad[1] = dfx2(x0);
        x0[0] = x0[0] + t*d[0];
        x0[1] = x0[1] + t*d[1];
        
        t *= beta;
        count += 1
        x.append(f(x0));
    return t

t = backtrack(x0, dfx1, dfx2, t, alpha, beta,count)

print("\nfinal step size :",  t)

print(np.log(x))

print(f(x0))

wtw · Accepted Answer · 2021-03-30T20:56:42.780

Update for new code

OK, your numbers now change too much!

When writing these routines stepping through the code with a debugger is really useful to check that the code is doing what you want. In this case you'll see that on the second pass through the loop x0 = [-1.32e+170, 3.96e+170]. Taking the exponential of that is what is causing the problem. If you don't have a debugger, try printing some values.

What can you do to fix it? One fix is to reduce t. Starting with t=1e-3 stops the problem.

However, I suspect that you have more issues in store. I don't know the technique you are trying to implement, but I suspect that t shouldn't just reduce by a fixed ratio each step, and the np.dot(...) term with anti-parallel vectors also looks suspect.

If the implementation is not the point of this, is there a library function that does what you want? For example, if you just want to minimise f from the starting point 1, 1 see the minimise function in SciPy [https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html]

import scipy.optimize
import numpy as np

def get_exponentials(x):
    a = np.exp(x[0] + 3 * x[1] - 0.1)
    b = np.exp(x[0] - 3 * x[1] - 0.1)
    c = np.exp(-x[0] - 0.1)
    return a, b, c
def dfdx(x):
    a, b, c = get_exponentials(x)
    return np.array([a + b - c, 3 * (a - b)])
def f(x):
    a, b, c = get_exponentials(x)
    return a + b + c

scipy.optimize.minimize(f, np.array([1.0, 1.0]), jac=dfdx)
# x = [-3.46573682e-01, -5.02272755e-08]), f = 2.559267

If line search is particularly important to you there is [https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.line_search.html]:

scipy.optimize.line_search(f, dfdx, np.array([1.0, 1.0]), np.array([-1.0, -1.0]))
# alpha 1.0,
# number of function calls 2,
# number of gradient calls 1,
# new function value 2.7145122541078788,
# old function value 49.85777661748123,
# new gradient [0.90483742, 0.]

Finally - you said initially that you weren't happy with the lambdas - there is no need to use them at all. They are not often used to generate named functions. See [https://stackoverflow.com/questions/134626/which-is-more-preferable-to-use-lambda-functions-or-nested-functions-def] for more of a discussion.

When I set x0 to [1.0, 1.0] I get a math range error. I want to start from that point, any ideas on how to solve it. Did I do the gradient descent correctly otherwise? I am going to edit with my changes. — DockingBlade, Mar 30 '21 at 19:44

Having trouble making update in gradient descent implementation?

1 Answers1