Optimization with Constraints

Question

I am working with the output from a model in which there are parameter estimates that may not follow a-priori expectations. I would like to write a function that forces these utility estimates back in line with those expectations. To do this, the function should minimize the sum of the squared deviance between the starting values and the new estimates. Since we have a-priori expections, the optimization should be subject to the following constraints:

B0 < B1
B1 < B2
...
Bj < Bj+1

For example, the raw parameter estimates below are flipflopped for B2 and B3. The columns Delta and Delta^2 show the deviance between the original parameter estimate and the new coefficient. I am trying to minimize the column Delta^2. I've coded this up in Excel and shown how Excel's Solver would optimize this problem providing the set of constraints:

Beta    BetaRaw    Delta    Delta^2    BetaNew
B0       1.2       0        0          1.2
B1       1.3       0        0          1.3
B2       1.6       -0.2     0.04       1.4
B3       1.4       0        0          1.4
B4       2.2       0        0          2.2

After reading through ?optim and ?constrOptim, I'm not able to grok how to set this up in R. I'm sure I'm just being a bit dense, but could use some pointers in the right direction!

3/24/2012 - Added bounty since I'm not smart enough to translate the first answer.

Here's some R code that should be on the right path. Assuming that the betas start with:

betas <- c(1.2,1.3,1.6,1.4,2.2)

I want to minimize the following function such that b0 <= b1 <= b2 <= b3 <= b4

f <- function(x) {
  x1 <- x[1]
  x2 <- x[2]
  x3 <- x[3]
  x4 <- x[4]
  x5 <- x[5]

 loss <- (x1 - betas[1]) ^ 2 + 
         (x2 - betas[2]) ^ 2 + 
         (x3 - betas[3]) ^ 2 + 
         (x4 - betas[4]) ^ 2 +
         (x5 - betas[5]) ^ 2    

  return(loss)
}

To show that the function works, the loss should be zero if we pass the original betas in:

> f(betas)
[1] 0

And relatively large with some random inputs:

> set.seed(42)
> f(rnorm(5))
[1] 8.849329

And minimized at the values I was able to calculate in Excel:

> f(c(1.2,1.3,1.4,1.4,2.2))
[1] 0.04

On reflection, you are in fact describing an ordered logistic regression (http://en.wikipedia.org/wiki/Ordered_logit). In package `MASS` the function `polr` can solve this type of problem. There is an example at http://www.stat.washington.edu/quinn/classes/536/S/polrexample.html. Kenneth Train describes this well in his book "Discrete Choice Methods with Simulation" — Andrie, Mar 28 '12 at 06:07
@Andrie - perhaps I just need my morning coffee, but I'm having a hard time connecting the dots between the polr example and what I need to do here. With `polr()`, isn't the goal to predict a set of proportional odds ratios? I've got Ken Train's book sitting on my bookshelf (collecting dust), so I'll give it a whirl as well. Thank you. — Chase, Mar 29 '12 at 13:45
@Andrie +1 for Train. Note it's available online in PDF form as well. — Ari B. Friedman, Jul 08 '12 at 23:42

score 10 · Accepted Answer · edited May 23 '17 at 10:28

10

1. Since the objective is quadratic and the constraints linear, you can use solve.QP.

It finds the b that minimizes

(1/2) * t(b) %*% Dmat %*% b - t(dvec) %*% b

under the constraints

t(Amat) %*% b >= bvec.

Here, we want b that minimizes

sum( (b-betas)^2 ) = sum(b^2) - 2 * sum(b*betas) + sum(beta^2)
                   = t(b) %*% t(b) - 2 * t(b) %*% betas + sum(beta^2).

Since the last term, sum(beta^2), is constant, we can drop it, and we can set

Dmat = diag(n)
dvec = betas.

The constraints are

b[1] <= b[2]
b[2] <= b[3]
...
b[n-1] <= b[n]

i.e.,

-b[1] + b[2]                       >= 0
      - b[2] + b[3]                >= 0
               ...
                   - b[n-1] + b[n] >= 0

so that t(Amat) is

[ -1  1                ]
[    -1  1             ]
[       -1  1          ]
[             ...      ]
[                -1  1 ]

and bvec is zero.

This leads to the following code.

# Sample data
betas <- c(1.2, 1.3, 1.6, 1.4, 2.2)

# Optimization
n <- length(betas)
Dmat <- diag(n)
dvec <- betas
Amat <- matrix(0,nr=n,nc=n-1)
Amat[cbind(1:(n-1), 1:(n-1))] <- -1
Amat[cbind(2:n,     1:(n-1))] <-  1
t(Amat)  # Check that it looks as it should
bvec <- rep(0,n-1)
library(quadprog)
r <- solve.QP(Dmat, dvec, Amat, bvec)

# Check the result, graphically
plot(betas)
points(r$solution, pch=16)

2. You can use constrOptim in the same way (the objective function can be arbitrary, but the constraints have to be linear).

3. More generally, you can use optim if you reparametrize the problem into a non-constrained optimization problem, for instance

b[1] = exp(x[1])
b[2] = b[1] + exp(x[2])
...
b[n] = b[n-1] + exp(x[n-1]).

There are a few examples here or there.

edited May 23 '17 at 10:28

Community

1
1

answered Mar 22 '12 at 06:26

Vincent Zoonekynd

31,893
5
69
78

thanks for the response. I know everything I need is contained above, but I'm not seeing it. Specifically, 1) how/where are the constraints specified, and 2) where do I input a function to calculate the deviance between b_0 and the estimates? – Chase Mar 22 '12 at 21:55
This is explained in `?solve.QP`: it finds the `b` that minimizes `(1/2) * t(b) %*% Dmat %*% b - t(dvec) %*% b` under the constraints `t(Amat) %*% b >= bvec`. The first two arguments of `solve.QP` define the objective function, the last two the constraints. You "just" have to put your problem under this form... – Vincent Zoonekynd Mar 22 '12 at 22:23
Hmmm, alright - that helps confirm my interpretation of the help page. I guess I just don't think well enough in matrices to see how I should set up `Dmat` and `Amat`...I'll keep thinking through this. Thanks! – Chase Mar 22 '12 at 22:53
Not sure if you're still following this, but any more specific insight to my specific optimization would be appreciated. – Chase Mar 27 '12 at 14:58
Awesome. Thanks for the insight, this actually makes a bit of sense now too! – Chase Mar 28 '12 at 00:30
One last follow up - do you have any recommendations for a gentle introduction / reference text for this sort of analysis? You know, the whole teach a man to fish thing...I pulled up the first reference in the help page for solve.QP, but it looks a bit dense. – Chase Mar 28 '12 at 04:25
I usually advise Cornuejols and Tutuncu's [Optimization methods in finance](http://www.amazon.com/Optimization-Methods-Finance-Mathematics-Risk/dp/0521861705). It does not assume any prior knowledge in operations research, is rather exhaustive, but the examples are limited to finance. You also can try to ask on [CrossValidated](http://stats.stackexchange.com/questions/tagged/books): they may have more general or elementary references. – Vincent Zoonekynd Mar 28 '12 at 04:50

score 0 · Answer 2 · answered Mar 27 '12 at 22:32

Alright, this is starting to take form, but still has some bugs. Based on the conversation in chat with @Joran, it seems I can include a conditional that will set the loss function to an arbitrarily large value if the values are not in order. This seems to work IF the discrepancy occurs between the first two coefficients, but not thereafter. I'm having a hard time parsing out why that would be the case.

Function to minimize:

f <- function(x, x0) {
  x1 <- x[1]
  x2 <- x[2]
  x3 <- x[3]
  x4 <- x[4]
  x5 <- x[5]

 loss <- (x1 - x0[1]) ^ 2 + 
         (x2 - x0[2]) ^ 2 + 
         (x3 - x0[3]) ^ 2 + 
         (x4 - x0[4]) ^ 2 +
         (x5 - x0[5]) ^ 2    

  #Make sure the coefficients are in order
  if any(diff(c(x1,x2,x3,x4,x5)) > 0) loss = 10000000

  return(loss)
}

Working example (sort of, it seems the loss would be minimized if b0 = 1.24?):

> betas <- c(1.22, 1.24, 1.18, 1.12, 1.10)
> optim(betas, f, x0 = betas)$par
[1] 1.282 1.240 1.180 1.120 1.100

Non-working example (note that the third element is still larger than the second:

> betas <- c(1.20, 1.15, 1.18, 1.12, 1.10)
> optim(betas, f, x0 = betas)$par
[1] 1.20 1.15 1.18 1.12 1.10

For increasing numbers, as in the original question, you probably need a penalty if `any(diff(x))<0)`, rather than `>0`. — Vincent Zoonekynd, Mar 27 '12 at 23:12
Since the penalty for the constraint breaches is constant, the optimization algorithm does not know in which direction to modify an incorrect solution to improve it, and could conclude that, since the loss function looks constant, the current value, even if it is high, is a local minimum. You can make the penalty dependent on the amplitude of the breaches. `f <- function(x, x0) { loss <- sum((x-x0)^2); if( any(diff(x)<0) ) { loss <- loss + 1e9*sum(pmax(0,-diff(x))) }; loss }; optim(betas, f, x0 = betas)$par` — Vincent Zoonekynd, Mar 27 '12 at 23:13

Optimization with Constraints

2 Answers2

Linked