Implementing additional constraint variables in integer programming using lpSolve

Question

I'm working to implement a lpSolve solution to optimizing a hypothetical daily fantasy baseball problem. I'm having trouble applying my last constraint:

position - Exactly 3 outfielders (OF) 2 pitchers (P) and 1 of everything else
cost - Cost less than 200
team - Max number from any one team is 6
team - Minimum number of teams on a roster is 3**

Say for example you have a dataframe of 1000 players with points, cost, position, and team and you're trying to maximize average points:

library(tidyverse)
library(lpSolve)
set.seed(123)
df <- data_frame(avg_points = sample(5:45,1000, replace = T),
                 cost = sample(3:45,1000, replace = T),
                 position = sample(c("P","C","1B","2B","3B","SS","OF"),1000, replace = T),
                 team = sample(LETTERS,1000, replace = T)) %>% mutate(id = row_number())
head(df)

# A tibble: 6 x 5
#  avg_points  cost position team     id
#       <int> <int> <chr>    <chr> <int>
#1         17    13 2B       Y         1
#2         39    45 1B       P         2
#3         29    33 1B       C         3
#4         38    31 2B       V         4
#5         17    13 P        A         5
#6         10     6 SS       V         6

I've implemented the first 3 constraints with the following code, but i'm having trouble figuring out how to implement the minimum number of teams on a roster. I think I need to add additional variable to the model, but i'm not sure how to do that.

#set the objective function (what we want to maximize)
obj <- df$avg_points 
# set the constraint rows.
con <- rbind(t(model.matrix(~ position + 0,df)), cost = df$cost, t(model.matrix(~ team + 0, df)) )

#set the constraint values
rhs <- c(1,1,1,1,3,2,1,  # 1. #exactly 3 outfielders 2 pitchers and 1 of everything else
         200, # 2. at a cost less than 200
         rep(6,26) # 3. max number from any team is 6
         ) 

#set the direction of the constraints
dir <- c("=","=","=","=","=","=","=","<=",rep("<=",26))

result <- lp("max",obj,con,dir,rhs,all.bin = TRUE)

If it helps, i'm trying to replicate This paper (with minor tweaks) which has corresponding julia code here

I don't think your last condition can be expressed as linear constraint. At best I can only think of a quadratic constraint that would work, but I'm not sure if that's solvable. To be sure, you should probably ask a question in crossvalidated. — Rohit, Apr 24 '19 at 11:31
I believe it was expressed in as a linear constraint in the attached paper and code. I agree I may need to consider crossvalidated. — jasbner, Apr 24 '19 at 11:36

score 5 · Accepted Answer · edited May 01 '19 at 11:56

This might be a solution for your problem.

This is the data I have used (identical to yours):

library(tidyverse)
library(lpSolve)
N <- 1000

set.seed(123)
df <- tibble(avg_points = sample(5:45,N, replace = T),
             cost = sample(3:45,N, replace = T),
             position = sample(c("P","C","1B","2B","3B","SS","OF"),N, replace = T),
             team = sample(LETTERS,N, replace = T)) %>% 
  mutate(id = row_number())

You want to find x1...xn that maximise the objective function below:

x1 * average_points1 + x2 * average_points1 + ... + xn * average_pointsn

With the way lpSolve works, you will need to express every LHS as the sum over x1...xn times the vector you provide.

Since you cannot express the number of teams with your current variables, you can introduce new ones (I will call them y1..yn_teams and z1..zn_teams):

# number of teams:
n_teams = length(unique(df$team))

Your new objective function (ys and zs will not influence your overall objective funtion, since the constant is set to 0):

obj <- c(df$avg_points, rep(0, 2 * n_teams))

)

The first 3 constraints are the same, but with the added constants for y and z:

c1 <- t(model.matrix(~ position + 0,df))
c1 <- cbind(c1, 
            matrix(0, ncol = 2 * n_teams, nrow = nrow(c1)))
c2 = df$cost
c2 <- c(c2, rep(0, 2 * n_teams))
c3 = t(model.matrix(~ team + 0, df))
c3 <- cbind(c3, matrix(0, ncol = 2 * n_teams, nrow = nrow(c3)))

Since you want to have at least 3 teams, you will first use y to count the number of players per team:

This constraint counts the number of players per team. You sum up all players of a team that you have picked and substract the corresponding y variable per team. This should be equal to 0. (diag() creates the identity matrix, we do not worry about z at this point):

# should be x1...xn - y1...n = 0
c4_1 <- cbind(t(model.matrix(~team + 0, df)), # x
              -diag(n_teams), # y
              matrix(0, ncol = n_teams, nrow = n_teams) # z
              ) # == 0

Since each y is now the number of players in a team, you can now make sure that z is binary with this constraint:

c4_2 <- cbind(t(model.matrix(~ team + 0, df)), # x1+...+xn ==
              -diag(n_teams), # - (y1+...+yn )
              diag(n_teams) # z binary
              ) # <= 1

This is the constraint that ensures that at least 3 teams are picked:

c4_3 <- c(rep(0, nrow(df) + n_teams), # x and y
          rep(1, n_teams) # z >= 3
          )

You need to make sure that

You can use the big-M method for that to create a constraint, which is:

Or, in a more lpSolve friendly version:

In this case you can use 6 as a value for M, because it is the largest value any y can take:

c4_4 <- cbind(matrix(0, nrow = n_teams, ncol = nrow(df)),
              diag(n_teams),
              -diag(n_teams) * 6)

This constraint is added to make sure all x are binary:

#all x binary
c5 <- cbind(diag(nrow(df)), # x
            matrix(0, ncol = 2 * n_teams, nrow = nrow(df)) # y + z
            )

Create the new constraint matrix

con <- rbind(c1,
             c2,
             c3,
             c4_1,
             c4_2,
             c4_3,
             c4_4,
             c5)

#set the constraint values
rhs <- c(1,1,1,1,3,2,1,  # 1. #exactly 3 outfielders 2 pitchers and 1 of everything else
         200, # 2. at a cost less than 200
         rep(6, n_teams), # 3. max number from any team is 6
         rep(0, n_teams), # c4_1
         rep(1, n_teams), # c4_2
         3, # c4_3,
         rep(0, n_teams), #c4_4
         rep(1, nrow(df))# c5 binary
)

#set the direction of the constraints
dir <- c(rep("==", 7), # c1
         "<=", # c2
         rep("<=", n_teams), # c3
         rep('==', n_teams), # c4_1
         rep('<=', n_teams), # c4_2
         '>=', # c4_3
         rep('<=', n_teams), # c4_4 
         rep('<=', nrow(df)) # c5
         )

The problem is almost the same, but I am using all.int instead of all.bin to make sure the counts work for the players in the team:

result <- lp("max",obj,con,dir,rhs,all.int = TRUE)
Success: the objective function is 450


roster <- df[result$solution[1:nrow(df)] == 1, ]
roster
# A tibble: 10 x 5
   avg_points  cost position team     id
        <int> <int> <chr>    <chr> <int>
 1         45    19 C        I        24
 2         45     5 P        X       126
 3         45    25 OF       N       139
 4         45    22 3B       J       193
 5         45    24 2B       B       327
 6         45    25 OF       P       340
 7         45    23 P        Q       356
 8         45    13 OF       N       400
 9         45    13 SS       L       401
10         45    45 1B       G       614

If you change your data to

N <- 1000

set.seed(123)
df <- tibble(avg_points = sample(5:45,N, replace = T),
             cost = sample(3:45,N, replace = T),
             position = sample(c("P","C","1B","2B","3B","SS","OF"),N, replace = T),
             team = sample(c("A", "B"),N, replace = T)) %>% 
  mutate(id = row_number())

It will now be infeasable, because the number of teams in the data is less then 3.

You can check that it now works:

sort(unique(df$team))[result$solution[1027:1052]==1]
[1] "B" "E" "I" "J" "N" "P" "Q" "X"
sort(unique(roster$team))
[1] "B" "E" "I" "J" "N" "P" "Q" "X"

The use of an additional objective dimension appears to be useful here. I think you've got the sign of the cost constraint reversed though. — jasbner, Apr 24 '19 at 18:18
I'm struggling to understand how `c4_1` "counts" the players for each team. It appears to be working correctly, but i'm struggling to wrap my head around how the identity matrix is used in this constraint. In other words what is the significance of the dummy y variable and how is it working in this context? — jasbner, Apr 24 '19 at 19:47
the identity matrix is used because it allows that for every variable `y1 ... yn` it will be 1 once, and 0 all other times. so the first matrix in `c4_1` is indicating if a player is in a team, if he is picked, `xn * 1 == 1` and `xn * 0 == 0`. it does this for every player. you then subtract `y` that corresponds to the team and set the difference to be 0. — clemens, Apr 24 '19 at 20:54
if player 1 and 3 are in team A, but 2, 3, and 4 are not, the first row of the matrix is `c(1, 0, 1, 0, 0`. the first row of the identity matrix is `c(1, 0, 0, 0, 0)`. if you pick player one, the equation would have to work out as `1-1 = 0` . — clemens, Apr 24 '19 at 20:55
Okay but then the binarization `c4_2` doesn't really rely on those numbers right? So, for example say you have 4 teams and your solution coefficients for the `y1...y4` are `c(0, 3, 4, 3)` shouldn't the corresponding binarized z coefficients be `c(0, 1, 1, 1)`? I'm confused because I'm getting z values that don't make sense to me in relation to the y count, for example: `c(1, 1, 0, 1)`. It does not appear to correspond to the actual teams they represent/binarize. — jasbner, Apr 25 '19 at 11:44
sorry for the late reply, you are absolutely right. i have updated the answer. — clemens, Apr 29 '19 at 09:32
Thanks for the update, I've added the corresponding rhs code. — jasbner, May 01 '19 at 11:57
Unfortunately the constraint still does not hold because if y is zero z can be 1 and still satisfy the constraint. e.g. `y = c(0,3,4,3)` and `z = c(1,1,1,1)` would still satisfy all constraints. — jasbner, May 01 '19 at 12:08
no, it would not: `1 - M*0` equates to `1`, which would violate the constraint. — clemens, May 02 '19 at 06:43
My example was the reverse situation where `0 - M * 1` equates to `-6` which does not violate the constraint. — jasbner, May 02 '19 at 13:23
@clemens am I missing something here or does constraint `c4_3` not generate the solution that you asked for in your OP. Setting `rhs` to 3 and `lhs` to `>=` for constraint `c4_3` is not generating lineups with a 3-team stack. `c4_3` should be generating lineups with a 3 stack or more each time a new solution is generated but that is not always happening. — On_an_island, May 11 '23 at 18:18

Implementing additional constraint variables in integer programming using lpSolve

1 Answers1