Implementing gradient descent on with known objective function

Question

I have an objective function from a paper that I would like to minimize with gradient descent. I have not yet had to do this "from scratch" and would like some advice as to how to code it up manually. The objective function is:

T(L) = tr(X.T L^s X) - beta * ||L||.

where L is an N x N matrix positive semidefinite matrix to be estimated, X is an N x M matrix, beta is a regularization constant, X.T = X transpose, and ||.|| is the frobenius norm.

Also, L^s is the matrix exponential where L^s = F Λ^s F.T, where F is a matrix of the eigenvectors of L and Λ is the diagonal matrix of eigenvalues of L.

The derivative of the objective function is:

dT/dL = sum_{from r = 0 to r = s - 1} L^r (XX.T) L^(s-r-1) - 2 * beta * L

I have done very rudimentary gradient descent problems (such as matrix factorization) where optimization is done over every element of the matrix, or using packages/libraries. This kind of problem is more complex I am used to, and I was hoping that some of you that are much more experienced with this sort of thing could help me out.

Any general advice is much appreciated as well as specific recommendations of how to code this up in python or R.

Here is the link for the paper with this function: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128136#sec016

Thank you very much for your help!

Paul

Questions is broad, you need to search - plenty of optimization raw / source code on SO & elsewhere: https://stackoverflow.com/questions/43217617/estimating-linear-regression-with-gradient-descent-steepest-descent — aiatay7n, May 03 '20 at 18:57
That's fair - I guess I should have been more specific. I am familiar with how to implement gradient descent on simpler problems, such as linear regression. I just didn't know if when the variables to be estimates are in matrix form/the objective function is not as simple as SSE, if I should take the same approach. It was difficult for me to find material "in between" GD for regression and what I am trying to do and just wanted general guidance or an example, whatever is easiest for the person answering the question. I'll try to be more specific in the future. Thanks! — user8271479, May 03 '20 at 20:01

score 2 · Accepted Answer · answered May 03 '20 at 19:17

In general, it would probably be advisable to use a machine learning library such as tensorflow or pytorch. If you go down this route you have several advantages 1) efficient C++ implementation of the Tensor operations 2) automatic differentiation 3) easy access to more sophisticated optimizers (e.g. ADAM). ` If you prefer to do the gradient computation yourself you could do that by setting the gradient L.grad manually before the optimization step

A simple implementation would look like this:

import torch

n=10
m=20
s = 3
b=1e-3
n_it=40

# L=torch.nn.Parameter(torch.rand(n,n))
F=torch.nn.Parameter(torch.rand(n,n))
D=torch.nn.Parameter(torch.rand(n))
X=torch.rand((n,m))
opt=torch.optim.SGD([F,D],lr=1e-4)


for i in range(n_it):
    loss = (X.T.matmul(F.matmul((D**s).unsqueeze(1)*F.T)).matmul(X)).trace() - b * F.matmul((D**s).unsqueeze(1)*F.T).norm(2)
    print(loss)
    opt.zero_grad()
    loss.backward()
    opt.step()

glad to have helped. I would appreciate it, if you could accept the answer, if it works for you — Dillmann, May 03 '20 at 20:00

Implementing gradient descent on with known objective function

1 Answers1