How to optimize for integer parameters (and other discontinuous parameter space) in R?

Question

How does one optimize if the parameter space is only integers (or is otherwise discontinuous)?

Using an integer check in optim() does not seem to work and would be very inefficient anyways.

fr <- function(x) {   ## Rosenbrock Banana function
  x1 <- x[1]
  x2 <- x[2]
  value<-100 * (x2 - x1 * x1)^2 + (1 - x1)^2

  check.integer <- function(N){
    !length(grep("[^[:digit:]]", as.character(N)))
  }

  if(!all(check.integer(abs(x1)), check.integer(abs(x2)))){
   value<-NA 
  }
  return(value)

}
optim(c(-2,1), fr)

http://r.789695.n4.nabble.com/non-linear-integer-optimization-td2551409.html — Etienne Low-Décarie, Jun 21 '12 at 15:51

Vincent Zoonekynd · Accepted Answer · 2012-06-21T03:07:47.330

Here are a few ideas.

1. Penalized optimization. You could round the arguments of the objective function and add a penalty for non-integers. But this creates a lot of local extrema, so you may prefer a more robust optimization routine, e.g., differential evolution or particle swarm optimization.

fr <- function(x) {
  x1 <- round( x[1] )
  x2 <- round( x[2] )
  value <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
  penalty <- (x1 - x[1])^2 + (x2 - x[2])^2
  value + 1e3 * penalty
}

# Plot the function
x <- seq(-3,3,length=200)
z <- outer(x,x, Vectorize( function(u,v) fr(c(u,v)) ))
persp(x,x,z,
  theta = 30, phi = 30, expand = 0.5, col = "lightblue", border=NA,
  ltheta = 120, shade = 0.75, ticktype = "detailed")

perspective plot

library(RColorBrewer)
image(x,x,z, 
  las=1, useRaster=TRUE,
  col=brewer.pal(11,"RdYlBu"),
  xlab="x", ylab="y"
)

image plot

# Minimize
library(DEoptim)
library(NMOF)
library(pso)
DEoptim(fr, c(-3,-3), c(3,3))$optim$bestmem
psoptim(c(-2,1), fr, lower=c(-3,-3), upper=c(3,3))
DEopt(fr, list(min=c(-3,-3), max=c(3,3)))$xbest
PSopt(fr, list(min=c(-3,-3), max=c(3,3)))$xbest

2. Exhaustive search. If the search space is small, you can also use a grid search.

library(NMOF)
gridSearch(fr, list(seq(-3,3), seq(-3,3)))$minlevels

3. Local search, with user-specified neighbourhoods. Without tweaking the objective function, you could use some form of local search, in which you can specify which points to examine. This should be much faster, but is extremely sensitive to the choice of the neighbourhood function.

# Unmodified function
f <- function(x) 
  100 * (x[2] - x[1] * x[1])^2 + (1 - x[1])^2

# Neighbour function
# Beware: in this example, with a smaller neighbourhood, it does not converge.
neighbour <- function(x,...)
  x + sample(seq(-3,3), length(x), replace=TRUE)

# Local search (will get stuck in local extrema)
library(NMOF)
LSopt(f, list(x0=c(-2,1), neighbour=neighbour))$xbest
# Threshold Accepting
TAopt(f, list(x0=c(-2,1), neighbour=neighbour))$xbest

4. Tabu search. To avoid exploring the same points again and again, you can use tabu search, i.e., remember the last k points to avoid visiting them again.

get_neighbour_function <- function(memory_size = 100, df=4, scale=1){
  # Static variables
  already_visited <- NULL
  i <- 1
  # Define the neighbourhood
  values <- seq(-10,10)
  probabilities <- dt(values/scale, df=df)
  probabilities <- probabilities / sum(probabilities)
  # The function itself
  function(x,...) {
    if( is.null(already_visited) ) {
      already_visited <<- matrix( x, nr=length(x), nc=memory_size )
    }
    # Do not reuse the function for problems of a different size
    stopifnot( nrow(already_visited) == length(x) )
    candidate <- x
    for(k in seq_len(memory_size)) {
      candidate <- x + sample( values, p=probabilities, length(x), replace=TRUE )
      if( ! any(apply(already_visited == candidate, 2, all)) )
        break
    }
    if( k == memory_size ) {
      cat("Are you sure the neighbourhood is large enough?\n")
    } 
    if( k > 1 ) {
      cat("Rejected", k - 1, "candidates\n")
    }
    if( k != memory_size ) {
      already_visited[,i] <<- candidate
      i <<- (i %% memory_size) + 1
    }
    candidate
  }
}

In the following example, it does not really work: we only move to the nearest local minimum. And in higher dimensions, things get even worse: the neighbourhood is so large that we never hit the cache of already visited points.

f <- function(x) {
  result <- prod( 2 + ((x-10)/1000)^2 - cos( (x-10) / 2 ) )  
  cat(result, " (", paste(x,collapse=","), ")\n", sep="")
  result
}
plot( seq(0,1e3), Vectorize(f)( seq(0,1e3) ) )

LSopt(f, list(x0=c(0,0), neighbour=get_neighbour_function()))$xbest
TAopt(f, list(x0=c(0,0), neighbour=get_neighbour_function()))$xbest
optim(c(0,0), f, gr=get_neighbour_function(), method="SANN")$par

Differential evolution works better: we only get a local minimum, but it is better than the nearest one.

g <- function(x) 
  f(x) + 1000 * sum( (x-round(x))^2 )
DEoptim(g, c(0,0), c(1000,1000))$optim$bestmem

Tabu search is often used for purely combinatorial problems (e.g., when the search space is a set of trees or graphs) and does not seem to be a great idea for integer problems.

Textbook quality answer! Beautiful plots illustrate the problem! Thanks a lot! I though there would be a specific method for discontinuous and/or integer functions. You may be the best person to answer this side question: Is there a way to optimize given a parameter space of vectors for each parameter (eg. 1:1000) without simply exploring this whole parameter space? This might be much more efficient than searching a penalized continuous bounded space, even if we must search the whole space (which would still be a small space compared to the comparable continuous space). — Etienne Low-Décarie, Jun 20 '12 at 14:40
If the search space is not too large, you can use an exhaustive search (grid search). Otherwise, you can search a large discrete space by choosing a neighbourhood function and using some local algorithm: local search, threshold accepting, simulated annealing (@HansWerner's answer). Threshold accepting is similar to simulated annealing, but more deterministic: it converges faster, but less often, and is more sensitive to the choice of neighbourhood. You should test with your actual function: simulated annealing is usually much slower, but the restriction to integers should speed it up. — Vincent Zoonekynd, Jun 20 '12 at 15:15
The fact that both LSopt and TAopt don't remember where they have been means they tend to turn in circles in my function. — Etienne Low-Décarie, Jun 20 '12 at 16:56
[Tabu search](http://en.wikipedia.org/wiki/Tabu_search) is a variant of those algorithms that remembers which points have already been explored. There is a `tabuSearch` package, but unfortunately it is limited to binary search spaces. However, you could modify the neighbourhood function of `LSopt` or `TAopt` so that it remembers the last k points and avoids them. — Vincent Zoonekynd, Jun 21 '12 at 00:29
I have edited my answer with more details about tabu search, but it does not seem to be adapted to integer problems. — Vincent Zoonekynd, Jun 21 '12 at 03:08

score 16 · Answer 2 · edited Jun 20 '12 at 21:27

Integer Programming (IP) has its own rules and algorithms. Using a continuous solver does not make much sense. R does not have a specialized integer programming solver, but you could try:

If your function is linear use one of the mixed integer programming solvers such as lp_solve as "lpSolve" in R or GLPK as "Rglpk" in R.
Otherwise, you might try optim with method "SANN", a simulated annealing approach, about which the documentation says:

"It uses only function values but is relatively slow... If a function to generate a new candidate point is given, method 'SANN' can also be used to solve combinatorial optimization problems... Note that the 'SANN' method depends critically on the settings of the control parameters."

Here is an example with the translated sphere function in [-10,10]x[-10,10]:

fun <- function(x) sum((x-c(3.2, 6.7))^2)
nextfun <- function(x) sample(-10:10, 2, replace=TRUE)

optim(fn=fun, par=c(-10,-10), gr=nextfun, method="SANN", 
      control=list(maxit=1000,fnscale=1,trace=10))

# sann objective function values
# initial       value 458.000000
# iter      999 value 0.000000
# final         value 0.000000
# sann stopped after 999 iterations
# $par
# [1] 3 7
# $value
# [1] 0.13

But you should apply a more intelligent 'gradient' that random sampling, or a complete search through your domain of integers if nothing else helps. Of course, in higher dimensions a specialized approach will be needed.

I wish I could accept two answers as they are both complimentary and form a perfect complete answer together! Thanks a lot! — Etienne Low-Décarie, Jun 20 '12 at 16:59

score 9 · Answer 3 · answered May 18 '14 at 09:38

There are new packages available in R which allow the use of discontinuous input parameters (For instance integer) in optimization programs. One of them is rgenoud

Using the option "data.type.int=TRUE" and by setting the correct boundaries the function will use only integers to minimize or maximize a given function.

Underneath rgenoud uses stats::optim() For optimization. Therefore the user is able to pass any options to rgenoud it would normally pass to optim()

How to optimize for integer parameters (and other discontinuous parameter space) in R?

3 Answers3

Linked