4

This question is related to Applying Cost Functions in R

I would like to know how to save the coefficients generated for each iteration of optim. trace=TRUE enables me to get the coefficients for each iteration printed, but how can I save them?

Example code:

set.seed(1)
X <- matrix(rnorm(1000), ncol=10) # some random data
Y <- sample(0:1, 100, replace=TRUE)

# Implement Sigmoid function
sigmoid <- function(z) {
  g <- 1/(1+exp(-z))
  return(g)
}

cost.glm <- function(theta,X) {
  m <- nrow(X)
  g <- sigmoid(X%*%theta)
  (1/m)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
}

X1 <- cbind(1, X)

df <- optim(par=rep(0,ncol(X1)), fn = cost.glm, method='CG',
            X=X1, control=list(trace=TRUE))

Which outputs:

 Conjugate gradients function minimizer
Method: Fletcher Reeves
tolerance used in gradient test=2.00089e-11
0 1 0.693147
parameters    0.00000    0.00000    0.00000    0.00000    0.00000    0.00000    0.00000 
   0.00000    0.00000    0.00000    0.00000 
 i> 1 3 0.662066
parameters   -0.01000   -0.01601   -0.06087    0.14891    0.04123    0.03835   -0.01898 
   0.00637    0.02954   -0.01423   -0.07544 
 i> 2 5 0.638548
parameters   -0.02366   -0.03733   -0.13803    0.32782    0.09034    0.08082   -0.03978 
   0.01226    0.07120   -0.02925   -0.16042 
 i> 3 7 0.630501
parameters   -0.03478   -0.05371   -0.19149    0.43890    0.11960    0.10236   -0.04935 
   0.01319    0.10648   -0.03565   -0.20408 
 i> 4 9 0.627570.......

And dfdoes not contain any information on the coefficients, but only displays the final coefficients and the final cost:

 str(df)
List of 5
 $ par        : num [1:11] -0.0679 -0.1024 -0.2951 0.6162 0.124 ...
 $ value      : num 0.626
 $ counts     : Named int [1:2] 53 28
  ..- attr(*, "names")= chr [1:2] "function" "gradient"
 $ convergence: int 0
 $ message    : NULL
nadizan
  • 1,323
  • 10
  • 23

2 Answers2

6
## use `capture.output` to get raw output
out <- capture.output(df <- optim(par=rep(0,ncol(X1)), fn = cost.glm, method='CG',
                                  X=X1, control=list(trace=TRUE)))
## lines that contain parameters
start <- grep("parameters", out)
param_line <- outer(seq_len(start[2] - start[1] - 1) - 1, start, "+")
## parameter message
param_msg <- gsub("parameters", "", out[param_line])
## parameter matrix (a row per iteration)
param <- matrix(scan(text = param_msg), ncol = length(df$par), byrow = TRUE)

## inspect output (rounded to 2-digits for compact display)

head(round(param, 2))
#       [,1]  [,2]  [,3] [,4] [,5] [,6]  [,7]  [,8] [,9] [,10] [,11]
# [1,]  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00 0.00  0.00  0.00
# [2,] -0.01 -0.02 -0.06 0.15 0.04 0.04 -0.02  0.01 0.03 -0.01 -0.08
# [3,] -0.02 -0.04 -0.14 0.33 0.09 0.08 -0.04  0.01 0.07 -0.03 -0.16
# [4,] -0.03 -0.05 -0.19 0.44 0.12 0.10 -0.05  0.01 0.11 -0.04 -0.20
# [5,] -0.04 -0.07 -0.23 0.51 0.14 0.11 -0.05  0.01 0.14 -0.04 -0.22
# [6,] -0.05 -0.08 -0.25 0.55 0.14 0.12 -0.05  0.01 0.16 -0.04 -0.23

tail(round(param, 2))
#[23,] -0.07 -0.10 -0.30 0.62 0.12 0.13 -0.03 -0.01 0.21 -0.04 -0.21
#[24,] -0.07 -0.10 -0.30 0.62 0.12 0.13 -0.03 -0.01 0.21 -0.04 -0.21
#[25,] -0.07 -0.10 -0.30 0.62 0.12 0.13 -0.03 -0.01 0.21 -0.04 -0.21
#[26,] -0.07 -0.10 -0.30 0.62 0.12 0.13 -0.03 -0.01 0.21 -0.04 -0.21
#[27,] -0.07 -0.10 -0.30 0.62 0.12 0.13 -0.03 -0.01 0.21 -0.04 -0.21
#[28,] -0.07 -0.10 -0.30 0.62 0.12 0.13 -0.03 -0.01 0.21 -0.04 -0.21

## one way to visualize the search steps
matplot(param, type = "l", lty = 1, xlab = "iterations")

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
2

So, the other solution works... but involves parsing the response inside the trace. Here's an approach that gives you access to the objects directly. (And would be general in any other optimization function that doesn't allow you to easily show a text trace):

(This works because you can assign inside of an environment from within a function)

EDIT: This adds another row every time cost.glm is run, not just every time the trace is evaluated.

Also added the conversion to the matrix format used by the other solution.

set.seed(1)
X <- matrix(rnorm(1000), ncol=10) # some random data
Y <- sample(0:1, 100, replace=TRUE)


# Implement Sigmoid function
sigmoid <- function(z) {
  g <- 1/(1+exp(-z))
  return(g)
}

# Create environment to store output
# We could also use .GlobalEnv
params_env <- new.env()

# Initialize parameters object
params_env$optim_run <- list()

cost.glm <- function(theta,X) {
  # Extend the list by 1 and insert theta inside the given environment
  # This can be done more efficiently by
  # extending several at a time, but that's easy to add.
  n <- length(params_env[['optim_run']])
  params_env[['optim_run']][[n + 1]] <- theta

  m <- nrow(X)
  g <- sigmoid(X%*%theta)
  (1/m)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
}

X1 <- cbind(1, X)

df <- optim(par=rep(0,ncol(X1)), fn = cost.glm, method='CG',
            X=X1, control=list(trace=TRUE))

# View list of all param values
print(params_env$optim_run)

# Return as same format as other solution
param <- do.call(rbind, params_env[['optim_run']])

matplot(param, type = "l", lty = 1, xlab = "iterations")

As we can see, there are more jumps in this set as cost.glm explores the space

Shape
  • 2,892
  • 19
  • 31
  • This output is rather messy compared with the other solution. Why is there so many repeats? – nadizan Dec 05 '18 at 10:44
  • Its just outputting as a list. Data should be same as the other solution. If you want to return in the same format as below you can do `do.call(rbind, params_env[['optim_run']])` – Shape Dec 05 '18 at 14:42
  • If the data is different, it means that the trace output of the other solution is not returning every time the cost.glm runs. Which would be a potential issue with the other solution – Shape Dec 05 '18 at 14:48
  • The difference is that (depending on the method used) the objective function can be evaluated by `optim` multiple times per iteration. This happens most (but not exclusively) when no derivative function has been provided, and the objective is evaluated many times at points very close to the current iteration in order to find the slope (partial derivatives) in many directions. Therefore, this method will return many values per iteration for each objective evaluation done by optim, as it decides where to go for the next iteration... – dww Dec 06 '18 at 22:37
  • @dww Yeah, that makes sense. I would tend to prefer to view every evaluation of the cost function per iteration, since iterations of the cost function are a better measure of performance than iterations of the algorithm. I would treat the gradient search as part of the search... but depends on the usecase – Shape Dec 07 '18 at 17:03
  • Yes, I agree - it depends on the use case which is better. I suspect that OP just wanted iterations rather than every evaluation (i.e. the other answer), but some users may prefer your answer – dww Dec 07 '18 at 17:17