storing each iteration in a gradient descent function in order to visualise the parameter updating process, and the cost covergence process in r

Question

I am trying to write a batch gradient decent function in r in to use on a training and test set of data. So far I have the below code. However, when I run it, it only prints out the last parameters and the iteration it ran. I would like to store each iteration, test error and be able to visualise the cost convergence process. I am not sure where to put or how to incorporate the code into the function below.

GradD <- function(x, y, alpha = 0.006, epsilon = 10^-10){
  iter <- 0
  i <- 0
  x <- cbind(rep(1, nrow(x)), x)
  theta <- matrix(c(1,1), ncol(x), 1)
  cost <- (1/(2*nrow(x)))* t(x%*% theta - y) %*% (x %*% theta - y)
  delta <- 1
  while (delta > epsilon){
    i <- i + 1
    theta <- theta - (alpha / nrow(x)) * (t(x) %*% (x %*% theta - y))
    cval <- (1/(2*nrow(x))) * t(x %*% theta - y) %*% (x %*% theta - y)
        cost <- append(cost, cval)
    delta <- abs(cost[i+1] - cost[i])
    if((cost[i+1] - cost[i]) > 0){
      print("The cost is increasing.  Try reducing alpha.")
      return()
    }
    iter <- append(iter, i)
  }
  print(sprintf("Completed in %i iterations.", i))
  return(theta)
}

TPredict <- function(theta, x){
  x <- cbind(rep(1,nrow(x)), x)
  return(x %*% theta)
}

EDIT I have tried to create a list that holds each iteration... however now i get errors when i run the code

error.cost <- function(x, y, theta){
  sum( (X %*% theta - y)^2 ) / (2*length(y))
}
num_iters <- 2000

cost_history <- double(num_iters)
theta_history <- list(num_iters)

GradD <- function(x, y, alpha = 0.006, epsilon = 10^-10){
  iter <- 2000
  i <- 0
  x <- cbind(rep(1,nrow(x)), x)
  theta <- matrix(c(1,1),ncol(x),1)
  cost <- (1/(2*nrow(x))) * t(x %*% theta - y) %*% (x %*% theta - y)
  delta <- 1
  while(delta > epsilon){
    i <- i + 1
    theta <- theta - (alpha / nrow(x)) * (t(x) %*% (x %*% theta - y))
    cval <- (1/(2*nrow(x))) * t(x %*% theta - y) %*% (x %*% theta - y)
    cost <- append(cost, cval)
    delta <- abs(cost[i+1] - cost[i])
    cost_history[i] <- error.cost(x, y, theta)
    theta_history[[i]] <- theta
    if((cost[i+1] - cost[i]) > 0){
      print("The cost is increasing.  Try reducing alpha.")
      return()
    }
    iter <- append(iter, i)
  }
  print(sprintf("Completed in %i iterations.", i))
  return(theta)
}

I get error in nrow(x) %% theta: non-conformable arguments. If i remove the nrow() in this function:

error.cost <- function(x, y, theta){
  sum( (x %*% theta - y)^2 ) / (2*length(y))
}

then it prints out results but they are the wrong final results and I dont have the iterations stored at all

Hi, think variables vs vectors or lists, per [empty vectors append](https://stackoverflow.com/questions/22235809/append-value-to-empty-vector-in-r). You're currently using variables, so it will contain last value. If you know the number of iterations you're going to perform, pre-allocate an empty vector of length(iter). If you want all of the parameters for later plotting, assessment, return a list. HTH — Chris, Jul 19 '20 at 20:10
How would you go about it if you don't know the number of iterations that will be performed? and would I make a separate list that appends theta in the for loop? — Beth Hooper, Jul 20 '20 at 02:02
You can overshoot, `iter <- vector(mode = 'integer', length = 10000),` then strip out the NA for the unused after. This pre-allocation prevents a bunch of write/rewrite. So at the top of your function you pre-allocate for all your different variables(iier, i, x, theta, cost, delta, cval), alpha and epsilon are constants. Seems your using cost as a kind of stop, reconsider value. I think preallocating theta handles it with `theta <- append(theta -...`, the return(list(theta & etc after your print(sprintf( is just if you're returning more than one thing. — Chris, Jul 20 '20 at 04:36
I'm not sure I understand what you mean by theta ```<- append(theta -....``` do i need to make a bigger matrix when i allocate theta at the start? — Beth Hooper, Jul 20 '20 at 06:57
Sorry, no, I didn't mean theta needs to be any bigger, nor appended to, I wasn't paying strict attention. I was speaking about the things that could be vectors, and `return(list(theta, cost, delta, cval) & etc` so you have the series available for your analysis as that seemed what you were asking for. — Chris, Jul 20 '20 at 12:03
so i would make everything a vector instead? eg,``` theta_history <- vector(mode = 'interger', length = 3000)``` then in my while loop have ```theta <- c(theta_history, theta - (alpha / nrow(x)) * (t(x) %*% (x %*% theta - y))``` ? — Beth Hooper, Jul 21 '20 at 00:32
Consider where I stepped in the door, capture history. I suggest vector as a container to catch things. Theta, as used in the gradient descent, is a matrix. But looking above that, what are x & y? And what history are you trying to assess, something like plotting toward convergence (I'm just making that up). Put some x, y data if you would as it will help me understand how this flows `dput(head(x, n=10 or 20))`. Thanks. — Chris, Jul 21 '20 at 00:48
I'm still stuck on the x input as it gets redefined inside the function, whereas y is y. — Chris, Jul 21 '20 at 20:36

Chris · Accepted Answer · 2020-07-22T20:36:50.767

The below, mainly uses your code and is merely a proof of concept of capturing history. The alpha value was arrived at by fishing around for a value that would pass the if cost increasing to create more than 1 or two records, which seemed the initial problem:

GradD2 <- function(x, y, alpha = 0.0000056, epsilon = 10^-10) {
cost <- vector(mode = 'numeric')
iter <- vector(mode = 'integer')
delta_hist <- vector(mode = 'numeric')
i <- 0
iter <- 0
x <- cbind(rep(1, nrow(x)), x)
theta <- matrix(c(1,1), ncol(x), 1)
cost <- (1/(2*nrow(x))) * t(x%*% theta -y) %*% (x %*% theta -y)
delta <- 1.0
while(length(iter) < 1000  ) { #todo - change back to while(delta>epsilon)
i <- i +1
theta <- theta - (alpha / nrow(x)) * (t(x) %*% (x %*% theta -y))
cval <- (1/(2*nrow(x))) * t(x %*% theta -y) %*% (x %*% theta -y)
cost <- append(cost, cval, after = length(cost))
delta <- abs(cost[i+1] - cost[i])
delta_hist <- append(delta_hist, delta, after = length(delta_hist))
if((cost[i+1] - cost[i]) > 0) {
print('The cost is increasing. Try reducing alpha.')
return(list(theta = theta,cost = cost, delta_hist = delta_hist, iter =  iter))
}
iter <- append(iter, i, after = length(iter))
}
print(sprintf('Completed in %i iterations.', i))
return(list(theta = theta,cost = cost, delta_hist = delta_hist, iter =  iter))
}

Results:

> str(GradD2_tst)
List of 4
 $ theta     : num [1:2, 1] 1.693 0.707
 $ cost      : num [1:1000] 88564 87541 86587 85698 84870 ...
 $ delta_hist: num [1:999] 1024 954 889 828 771 ...
 $ iter      : num [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
> GradD2_tst$delta_hist[994:999]
[1] 0.08580117 0.08580094 0.08580071 0.08580048 0.08580025 0.08580002
> GradD2_tst$cost[994:999]
[1] 73493.72 73493.63 73493.54 73493.46 73493.37 73493.29
>

Absent knowing otherwise, my x and y were:

> x <- as.matrix(sample(1:1000, 400), ncol =1)
> y <- sample(1:1000, 400)

This election for x, when used with the proper while(delta > epsilon) { presented a mismatch for cval that produced a warning, and now doesn't. Dang.

Fishing for alpha again with epsilon 10^-1:

> GradD2_tst2 <- GradD2(x,y, alpha=0.006, epsilon = 10^-1)
[1] "The cost is increasing. Try reducing alpha."
> GradD2_tst2 <- GradD2(x,y, alpha=0.001, epsilon = 10^-1)
[1] "The cost is increasing. Try reducing alpha."
> GradD2_tst2 <- GradD2(x,y, alpha=0.0006, epsilon = 10^-1)
[1] "The cost is increasing. Try reducing alpha."
> GradD2_tst2 <- GradD2(x,y, alpha=0.00006, epsilon = 10^-1)
[1] "The cost is increasing. Try reducing alpha."
> GradD2_tst2 <- GradD2(x,y, alpha=0.000056, epsilon = 10^-1)
[1] "The cost is increasing. Try reducing alpha."
> GradD2_tst2 <- GradD2(x,y, alpha=0.000009, epsilon = 10^-1)
[1] "The cost is increasing. Try reducing alpha."
> GradD2_tst2 <- GradD2(x,y, alpha=0.0000056, epsilon = 10^-1)
[1] "Completed in 160 iterations."
> 
> str(GradD2_tst2)
List of 4
 $ theta     : num [1:2, 1] 1.111 0.709
 $ cost      : num [1:161] 88564 87541 86587 85698 84870 ...
 $ delta_hist: num [1:160] 1024 954 889 828 771 ...
 $ iter      : num [1:161] 0 1 2 3 4 5 6 7 8 9 ...
> GradD2_tst2$cost[155:161]
[1] 73566.06 73565.96 73565.85 73565.75 73565.65 73565.55 73565.45
> GradD2_tst2$delta_hist[155:161]
[1] 0.10493825 0.10364385 0.10243786 0.10131425 0.10026738 0.09929201         NA
>

Takes a very long time to run at epsilon = 10^-10 on one core. Mine never completed overnight, Ctl-C. HTH as to history.

storing each iteration in a gradient descent function in order to visualise the parameter updating process, and the cost covergence process in r

1 Answers1