0

I'm doing some experimented, and I'm certainly aware why constraining coefficients is rarely needed, but here goes.

In the following data, I have used quadprog to solve a linear model. Note that X1 is simply the intercept.

X1 <- 1
X2 <- c(4.374, 2.3708, -7.3033, 12.0803, -0.4098, 53.0631, 13.1304, 7.3617, 16.6252, 27.3394)
X3 <- c(2.6423, 2.6284, 36.9398, 15.9278, 18.3124, 54.5039, 3.764, 19.0552, 25.4906, 13.0112)
X4 <- c(4.381, 3.144, 9.506, 15.329, 21.008, 38.091, 22.399, 13.223, 17.419, 19.87)
X <- as.matrix(cbind(X1,X2,X3,X4))
Y <- as.matrix(c(37.7,27.48,24.08,25.97,16.65,73.77,45.10,53.35,61.95,71.15))
M1 <- solve.QP(t(X) %*% X, t(Y) %*% X, matrix(0, 4, 0), c())$solution

The challenge is to subject certain coefficients to constraints. I know that I should be altering the Amet and bvac parameters (according to Linear regression with constraints on the coefficients). However, I'm not sure how to set it up, so that the following constraints are met.

The output reads [1] 37.3468790 1.2872473 -0.0177749 -0.5988443, where the values would be predicted fitted values of X1, X2, X3 and X4.

Constraints (subject to)…

X2 <= .899

0 <= X3 <= .500

0 <= X4 <= .334
Lindon
  • 55
  • 5

1 Answers1

1

Just to clarify: You give an expected output "where the values would be predicted fitted values of X1, X2, X3 and X4". I assume you are referring to estimated (not fitted) parameter values.

It's quite straightforward to implement constrained parameters when modelling data in rstan. rstan is an R interface to Stan, which in turn is a probabilistic programming language for statistical Bayesian inference.

Here is an example based on the sample data you provide.

  1. First, let's define our model. Note that we constrain parameters beta2, beta3 and beta4 according to your requirements.

    model <- "
    data {
        int<lower=1> n;   // number of observations
        int<lower=0> k;   // number of parameters
        matrix[n, k] X;   // data
        vector[n] y;      // response
    }
    
    parameters {
        real beta1;                                                  // X1 unconstrained
        real<lower=negative_infinity(), upper=0.899> beta2;          // X2 <= .899
        real<lower=0, upper=0.5> beta3;                              // 0 <= X3 <= 0.5
        real<lower=0, upper=0.334> beta4;                            // 0 <= X4 <= 0.334
        real sigma;                                                  // residuals
    }
    
    model {
        // Likelihood
        y ~ normal(beta1 * X[, 1] + beta2 * X[, 2] + beta3 * X[, 3] + beta4 * X[, 4], sigma);
    }"
    
  2. Next, we fit the model to your sample data.

    library(rstan)
    rstan_options(auto_write = TRUE)
    options(mc.cores = parallel::detectCores())
    df <- cbind.data.frame(Y, X)
    fit <- stan(
        model_code = model,
        data = list(
            n = nrow(df),
            k = ncol(df[, -1]),
            X = df[, -1],
            y = df[, 1]))
    
  3. Finally, let's print parameter estimates in a tidy way

    library(broom)
    tidy(fit, conf.int = TRUE)
    ## A tibble: 5 x 5
    #  term  estimate std.error conf.low conf.high
    #  <chr>    <dbl>     <dbl>    <dbl>     <dbl>
    #1 beta1   29.2      6.53   16.9        42.8
    #2 beta2    0.609    0.234   0.0149      0.889
    #3 beta3    0.207    0.138   0.00909     0.479
    #4 beta4    0.164    0.0954  0.00780     0.326
    #5 sigma   16.2      5.16    9.42       29.1
    

    We can also plot parameter estimates including CI.

Note that parameter estimates are consistent with the imposed constraints.

enter image description here

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Super helpful. I was able to scale this approach to the full dataset and list of constraints. I am getting an error about a C++ compiler not being install - something that I have tried to install but never seems to work. – Lindon Sep 07 '18 at 13:54