0

I am working with the R programming language. I am trying to perform Stochastic Gradient Descent on custom defined functions.

For instance, here is an example of using Gradient Descent to optimize a custom function (using the well established "pracma" library):

# define function:

  Rastrigin <- function(x)
    {
        return(20 + x[1]^2 + x[2]^2 - 10*(cos(2*pi*x[1]) + cos(2*pi*x[2])))
    }

# run gradient descent:

library(pracma)

> steep_descent(c(1, 1), Rastrigin)

$xmin
[1] 0.9949586 0.9949586

$fmin
[1] 1.989918

$niter
[1] 3

Now, I am trying to run Stochastic Gradient Descent on this same function. I found the following package that allow for Stochastic Gradient Descent (e.g. https://www.rdocumentation.org/packages/sgd/versions/1.1.1, https://rdrr.io/cran/torch/man/optim_rmsprop.html) - but this seems to more suited for functions within pre-existing statistical and machine learning models. I also tried looking for popular variants of Stochastic Gradient Descent such as ADAGRAD or RMSPROP, but there does not seem to be any straightforward methods to implement Stochastic Gradient Descent on custom defined functions.

For instance - suppose I wanted to run Stochastic Gradient Descent on the "Rastrigin" function that I defined above; how to do this?

Thanks!

Note: I understand that performing Gradient Descent on a function requires knowledge of the function's derivatives. From the this Stackoverflow post (Explicit formula versus symbolic derivatives in R), we can obtain the derivatives of the Rastrign Function:

#load libraries
library(Ryacas0)
library(Ryacas)

#define Rastrign function (here I am defining the function in "x" and "y" instead of "x[1] and x[2]"

z <- 20 + x^2 + y^2 - 10*(cos(2*pi*x) + cos(2*pi*y))
x <- Sym("x")
y <- Sym("y")

#first derivative with respect to x (note : 2 * pi = 6.283)
dx <- deriv(z, x, 1) 

dx

yacas_expression(2 * x - -62.83185307 * sin(6.28318530717959 * x))

#first derivative with respect to y
 dy <- deriv(z, y, 1) 

 dy

yacas_expression(2 * y - -62.83185307 * sin(6.28318530717959 * y))

Now that we know the first derivatives of the Rastrign Function with respect to "x" and "y" - can we write a function that performs Stochastic Gradient Descent on the Rastrign Function in R?

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 2
    What you are looking is called Automatic Differentiation, and most DL frameworks use it. – Dr. Snoopy Feb 15 '22 at 08:19
  • @ Dr Snoopy: thank you for your reply! If you have time, could you please show me how to optimize this function? Thanks! – stats_noob Feb 15 '22 at 14:16
  • 3
    Don't think that SGD is really pertinent here. You use it when your objective function is a sum of many independent terms and you just evaluate the gradient on the sum of a subset of those terms. The terms above are usually the cost function calculated for each data point of a dataset. Here you neither have data points nor an objective function with many terms, so don't think using SGD makes much sense. – nicola Feb 17 '22 at 14:03
  • @ nicola: thank you for your reply! I understood your point, I was just interested in seeing if it could work anyways for a problem like this? – stats_noob Feb 17 '22 at 15:04
  • I agree with @nicola. SGD is meant for fitting a model to data, where the objective function is the sum of squares error over hundreds of data points (actual - predicted)^2). To minimize this function, I would recommend optim from base R with method = "BFGS". – Arthur Feb 22 '22 at 14:23
  • @ Arthur: thank you for your reply! I also fuly agree ... i just wanted to see if this can be done anyways – stats_noob Feb 22 '22 at 15:06

0 Answers0