-1

I have actuals and four different models with their prediction & fitted values. With these fitted values, I want to find optimal weights so that (summation(wifi)-actuals)^2 is minimized. Here wi's are the weights I want to find optimally & fi's are the fitted values for each models.

The restrictions I am having for wi's are;

  1. Weights have to be greater than 0,
  2. Weights have to be smaller than 1,
  3. Weights must have sum of 1

I have seen a similar example here [https://stats.stackexchange.com/questions/385372/weight-optimization-in-order-to-maximize-correlation-r] but I could not replicate it for my particular problem.

Let's generate sample data to understand the problem better

actuals <- floor(runif(10, 500,1700)) 
model1_fitted <- floor(runif(10, 600,1800)) 
model2_fitted <- floor(runif(10, 400,1600)) 
model3_fitted <- floor(runif(10, 300,1500)) 
model4_fitted <- floor(runif(10, 300,1200)) 
sample_model <- data.frame(actuals, model1_fitted, model2_fitted,model3_fitted,model4_fitted)

Now, I need to optimally find (w1,w2,w3,w4) so that (summation(wifi)-actuals)^2 is minimized. I want to save the weights, as I mentioned I also have predictions from these four models. If I get the optimal weights, my predicted values for the ensemble model will be a linear function of these weights & predicted values. First predicted value of ensemble will be like below,

ensemble_pred_1 = w1*model1_pred1+w2*model2_pred1+w3*model3_pred1+w4*model4_pred1

Please help me to find optimal wi's so that I can generate the ensemble model as desired.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Hindol Ganguly
  • 363
  • 1
  • 4
  • 16

1 Answers1

1

Frame your problem according to the optimization problem and calculate the required constraints:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(123)
model1_fitted <- floor(runif(10, 600,1800)) 
model2_fitted <- floor(runif(10, 400,1600)) 
model3_fitted <- floor(runif(10, 300,1500)) 
model4_fitted <- floor(runif(10, 300,1200)) 
w <- c(0.2,0.3,0.1,0.4) # sample coefficients
sample_model <- tibble(model1_fitted, model2_fitted,model3_fitted,model4_fitted) %>%
  mutate(actuals= as.vector(as.matrix(.) %*% w)  + rnorm(10,sd=10))


X <- as.matrix(sample_model[,1:4])
y <- as.matrix(sample_model[,5])

# From solve.QP description
# solving quadratic programming problems of the form min(-d^T b + 1/2 b^T D b) with the constraints A^T b >= b_0.

# Your problem
# Minimize       || Xw - y ||^2     => Minimize 1/2 w'X'Xw - (y'X)w  => D=X'X , d= X'y
# Constraint w>0,w<1, sum(w)=1      => A'w >= b0

d <- t(X) %*% y
D <- t(X) %*% X
A <- cbind(rep(1,4),diag(4)) #constraint LHS
b0 <- c(1,numeric(4)) # constraint RHS

library(quadprog)
soln <- solve.QP(D,d,A,b0,meq = 1)
w1 <- soln$solution  # Your model wieghts
w1
#> [1] 0.20996764 0.29773563 0.07146838 0.42082836

Created on 2019-05-09 by the reprex package (v0.2.1)

Rohit
  • 1,967
  • 1
  • 12
  • 15
  • thanks for your answer. I was a bit busy so got delayed in replying. In soln <- solve.QP(D,d,A,b0,meq = 1,), why after meq=1 there is a comma? Is there any more argument needed? Secondly, I am getting error in solve.QP, "Error in solve.QP: NA/NaN/Inf in foreign function call (arg 2)" – Hindol Ganguly May 16 '19 at 12:46
  • Comma was a typo, fixed it. You'll get that error if there are NA/NaN/Inf values in your data. Remove those rows before you train your model. See: https://stackoverflow.com/questions/15773189/remove-na-nan-inf-in-a-matrix and other similiar threads – Rohit May 16 '19 at 14:53