control=list(maxit=2) does not work in optim function

Question

optimum_theta <- optim(par = theta_initial, fn = cost ,method = "BFGS", 
                       control=list(maxit=2))

I use the above code and add control argument, but this code will run for a long time and more than two iterations. I do not know why and how to solve this problem.

The following is all of the code that is relevant to this optimization.

cost <- function(theta){
  record   <- NULL
  record_2 <- NULL
  num <<- num + 1
  for(i in 1:124){
    
    loss <- log(1 + exp(-data_train$CellType[i] * (t(theta) %*% 
            as.numeric(data_train[i,-1228]))))
    record <- c(record,loss)
    
    loss_2 <- log(1 + exp(-data_test$CellType[i] * (t(theta) %*% 
                  as.numeric(data_test[i,-1228]))))
    record_2 <- c(record_2,loss_2)
  }
  
  training_cost[num] <<- mean(record)
  test_cost[num] <<- mean(record_2)
  
  result <- mean(record)
  return(result)
}


num <- 0
training_cost <- NULL
test_cost <- NULL

theta_initial <- rep(0, 1227)
optimum_theta <- optim(par = theta_initial, fn = cost, method = "BFGS", 
                       control=list(maxit=2))

#######the following is the update for my question#######

enter image description here

code used is like this:

colnames(data_plot)[3] <- "CellType"
n <- nrow(data_plot)
set.seed(12345)
id <- sample(1:n,floor(n * 0.5))
data_train <- data_plot[id,]
data_test <- data_plot[-id,]


data_train$CellType <- ifelse(data_train$CellType == "T-cell",1,-1)
data_test$CellType <- ifelse(data_test$CellType == "T-cell",1,-1)

data_train_p <- as.matrix(data_train[,-3])
data_test_p <- as.matrix(data_train[,-3])

lossfun <- function(theta,X, Y){
  result <- mean(log(1 + exp(-Y * (X %*%theta))))
  return(result)
}

cost <- function(theta){
  loss_train <- lossfun(theta,X = data_train_p,Y = data_train$CellType)
  loss_test <- lossfun(theta,X= data_test_p,Y = data_test$CellType)
  
  num <<- num + 1
  training_cost[num] <<- loss_train
  test_cost[num] <<- loss_test
  
  return(loss_train)
  
}

num <- 0
training_cost <- NULL
test_cost <- NULL


theta_initial <- rep(0,2)
optimum_theta <- optim(par = theta_initial,fn = cost,method = "BFGS",control=list(maxit=20))

iteration <- 1:num
data_plot <- data.frame(iteration,training_cost,test_cost)
data_plot <- reshape2::melt(data_plot,id.var = "iteration")

library(ggplot2)
ggplot(data_plot,aes(x=iteration,y= value,color= variable)) + geom_line()

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Mar 14 '23 at 22:29
You are using the superassignment operator `<<-`, be careful! See https://stackoverflow.com/questions/2628621/how-do-you-use-scoping-assignment-in-r — kjetil b halvorsen, Mar 14 '23 at 22:52
@kjetilbhalvorsen in this case I think it's being used as intended (and honestly is the least of the OP's problems ...) — Ben Bolker, Mar 14 '23 at 23:38

Ben Bolker · Answer 1 · 2023-03-14T23:40:10.250

optim is designed for low- to medium-dimensional optimization (i.e., not huge numbers of parameters). The particular problem that you're running into is that when optim uses a derivative-based optimizer (such as BFGS) and the gradient function is not explicitly specified by passing a grad argument, it automatically computes the gradient by finite differences: computing a gradient of length p by finite differences requires p evaluations of the objective function (plus the initial evaluation at the baseline parameter values). That means your function will be called 1228 times for each iteration of the algorithm.

You could try a derivative-free optimizer such as method = "Nelder-Mead", but that will take approximately the same number of evaluations per iteration. However, it might be more stable.

More fundamentally, however, you appear to be trying to do high-dimensional optimization, i.e. fitting a model with 1227 parameters to 124 data points. This is very unlikely to give you reasonable answers unless you use some kind of penalized algorithm (e.g., ridge or lasso) to regularize your solution.

There are a variety of other performance issues with your code that, if resolved, might speed up your objective function enough that this approach is feasible (at least in time; unless you think seriously about your high-dimensional problem you probably won't get anywhere ...)

"growing" vectors (x <- c(x, added_value)) is very slow in R
your loop can be done by a single matrix multiplication followed by some vectorized operations, which should be much faster than the loop
multiplying a numeric value by the value of a variable called CellType seems suspicious ...

I haven't actually tested this code since you haven't given us a reproducible example ...

## preallocate vectors; you can use `na.omit()` later
training_cost <- rep(NA_real_, 1e5)
test_cost <- rep(NA_real_, 1e5)
## convert to matrix once, up front
X_train <- as.matrix(data_train[, -1228])
X_test  <- as.matrix(data_train[, -1228])

lossfun <- function(theta, X, ct) {
    mean(log(1 + exp(-ct * (X %*% theta))))
}
   
cost <- function(theta) {
  
  loss_train <- lossfun(theta, X_train, data_train$CellType)
  
  ## computing test loss will slow everything down by a factor of 2,
  ##  but if you really want it ...
  loss_test <- lossfun(theta, X_test, data_test$CellType)
  
  num <<- num + 1
  training_cost[num] <<- loss_train
  test_cost[num] <<- loss_test

  return(loss_train)
}

You could also work out the gradient function analytically, which would help, or use some system that implements automatic differentiation - but again, your biggest problem is trying to do high-dimensional optimization without any form of penalization/regularization.

Thanks for your detailed answer! I helps me a lot. To my surprise, I met a new problem. This time the data set I use is only composed of three colums( two principle components and one celltype). After the modification as you suggested, I get a graph like what I add in my qeustion. Is it a good graph? — Jin Yan, Mar 15 '23 at 08:36
Please ask a new question instead of making your question into something else. Also, your new question is probably more appropriate for [CrossValidated](https://stats.stackoverflow.com) than for Stack Overflow. — Ben Bolker, Mar 15 '23 at 12:24

control=list(maxit=2) does not work in optim function

1 Answers1