1

I have a large two-step optimization problem that I've tried to simplify for this question. The first step is to choose 10 elements to maximize utility with certain constraints. I need 200 of these sets, but due to the nature of what I'm trying to do, there needs to be 600 generated so that the correct combinations can manifest.

Wrapping around these mini-optimization problems is a larger constraint where each individual element can only be used within a certain range. The first optimization tunes each element's utility so that each one is relatively close to the bounds, but it's not possible for all of them to be within their bounds. Therefore, the second step is to choose 200 of the 600 sets such that each individual element's min/max usage is satisfied. This is what I need help with.

I made a function using lpSolve that works, but over 80% of the time it freezes RStudio and it's just becoming too much of a hassle - I need to either improve my current approach, or need a completely new approach. I don't know if lpSolve is really the best approach to begin with. While I do have an overall set-score that I can maximize, all I really care about is having each element within the bounds. I've made a simplified example to get at the essence of my problem.


I'm in charge of making 200 meals from a set of 80 different fruits. Each meal uses 10 fruit and cannot have more than 1 of the same fruit. I'm limited in the number of fruits that I have (and my boss is making me use a minimum of each fruit otherwise they'll go bad), so they need to be within certain bounds. I have a list of 600 meals already created (Meals), and each one has it's own unique Health-Score. Ideally I would like to maximize the Health-Score, but obviously the most important piece is that each fruit is used the correct number of times, otherwise the meals can't be made in the first place.

Here's my code to 1) Setup the 600 Meals (random) 2) Set the min/max times each fruit must be used (random) 3) Run a linear optimization to select 200 of the 600 meals such that the individual fruit constraints are fulfilled. The program tries to chose 200 of the 600, but if the constraints don't allow it, then it loosens the constraints (e.g. if the solver doesn't work the first time, then I'll decrease the minimum number of times an Apple can be used, and increase the maximum number of times it can be used). It does this one fruit at a time, rather than all at once. Eventually the constraints should be loosened so much that any 200 of the 600 will work (i.e. when all fruit minPercent is less than 0 and all fruit maxPercent is greater than 100), but it doesn't matter because R freezes up.

library(stringr)
library(dplyr)
library(lpSolve)

# Inputs
MealsNeeded <- 200
Buffer <- 3

# Setup the meals (this is the output of another optimizer in my actual program. Considered "Step 1" as I mentioned above)
Meals <- data.frame()
for(i in 1:(MealsNeeded*Buffer)){

  run <- i
  meal <- sample(fruit, 10)
  healthFactor <- round(runif(1, 10, 30), 0) #(Health factor for the entire meal)

  df <- data.frame(Run = run, Fruit = meal, healthFactor = healthFactor, stringsAsFactors = FALSE)

  Meals <- rbind(Meals, df)

}

# The minimum/maximum number of times each fruit must be used across all 200 meals (these would be inputs in my program)
set.seed(11)
fruitDF <- data.frame(Name = fruit, minSelectPct = round(runif(length(fruit), .05, .1)*100, 0), stringsAsFactors = FALSE) %>% 
  mutate(maxSelectPct = round(minSelectPct/2 + runif(length(fruit), .05, .1)*100, 0))

#### Actual Program Start

# Get objective
obj <- Meals %>% 
  distinct(Run, healthFactor) %>% 
  ungroup() %>% 
  select(healthFactor) %>% 
  pull()

# Dummy LU - for each fruit give 1/0 whether or not they were in the meal
dummyLUInd <- data.frame(FruitName = fruitDF$Name, stringsAsFactors = FALSE)
for(i in unique(Meals$Run)){

  selectedFruit <- Meals %>%
    filter(Run == i) %>% 
    select(Fruit) %>% 
    mutate(Indicator = 1)

  dummyLUIndTemp <- fruitDF %>% 
    left_join(selectedFruit, by = c('Name' = 'Fruit')) %>% 
    mutate(Indicator = ifelse(is.na(Indicator), 0, Indicator)) %>% 
    select(Indicator)

  dummyLUInd <- cbind(dummyLUInd, dummyLUIndTemp)
}

## Table create
dummyLUInd <- rbind(dummyLUInd, dummyLUInd)[,-1]
dummyLUInd <- as.data.frame(t(dummyLUInd))
dummyLUInd$Total = 1

## Directions
dirLT <- c(rep('<=', (ncol(dummyLUInd)-1)/2))
dirGT <- c(rep('>=', (ncol(dummyLUInd)-1)/2))
## Multiply percentages by total Meals
MinExp = round(fruitDF$minSelectPct/100 * MealsNeeded - 0.499, 0) 
MaxExp = round(fruitDF$maxSelectPct/100 * MealsNeeded + 0.499, 0)

# Setup constraints like # of tries
CounterMax <- 10000
LPSum = 0
Counter = 0

# Create DF to make it easier to change constraints for each run
MinExpDF <- data.frame(Place = 1:length(MinExp), MinExp = MinExp)
MaxExpDF <- data.frame(Place = 1:length(MaxExp), MaxExp = MaxExp)
cat('\nStarting\n')
Sys.sleep(2)

# Try to get the 200 of 600 Meals that satisfy the constraints for the individual Fruit.
# If the solution doesn't exist, loosen the constraints for each fruit (one at a time) until it does work
while (LPSum == 0 & Counter <= CounterMax) {
  rowUse <- Counter %% length(MaxExp)

  # Knock one of minimum, starting with highest exposure, one at a time
  MinExpDF <- MinExpDF %>%
    mutate(Rank = rank(-MinExp, na.last = FALSE, ties.method = "first"),
           MinExp = ifelse(Rank == rowUse, MinExp - 1, MinExp)
    )
  MinExp <- MinExpDF$MinExp

  # Add one of maximum, starting with highest exposure, one at a time
  MaxExpDF <- MaxExpDF %>%
    mutate(Rank = rank(-MaxExp, na.last = FALSE, ties.method = "first"),
           MaxExp = ifelse(Rank == rowUse, MaxExp + 1, MaxExp))
  MaxExp <- MaxExpDF$MaxExp


  # Solve
  dir <- 'max'
  f.obj <- obj
  f.mat <- t(dummyLUInd)
  f.dir <- c(dirGT, dirLT, '==')
  f.rhs <- c(MinExp, MaxExp, MealsNeeded)
  Sol <- lp(dir, f.obj, f.mat, f.dir, f.rhs, all.bin = T)$solution
  LPSum <- sum(Sol)

  Counter = Counter + 1
  if(Counter %% 10 == 0) cat(Counter, ', ', sep = '')
}

# Get the Run #'s from the lpSolve
if(Counter >= CounterMax){
  cat("Unable to find right exposure, returning all Meals\n")
  MealsSolved <- Meals
} else {
  MealsSolved <- data.frame(Run = unique(Meals$Run))
  MealsSolved$selected <- Sol
  MealsSolved <- MealsSolved[MealsSolved$selected == 1,]
}
# Final Meals
FinalMeals <- Meals %>% 
  filter(Run %in% MealsSolved$Run)

If you run this code enough times, eventually RStudio will freeze up on you (at least it does for me, if it doesn't for you I suppose increase the number of Meals). It happens during the actual lp, so there's really not much you can do since it's really C code. This is where I'm lost.

A part of me thinks this isn't really an lpSolve issue since I'm not really trying to maximize anything (Health-Factor isn't all too important). My real "loss function" is the number of times each fruit goes above/below their min/max exposure, but I can't think of how to set something like this up. Can my current approach work, or do I need to do something different completely?

CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21
  • Have you tried profiling your code? https://cougrstats.wordpress.com/2018/09/24/code-profiling-monitoring-and-speeding-up-your-code/ – Tung Oct 28 '18 at 03:01
  • 1
    @Tung I have and I know that my code could be more efficient, but the issue really is just the `lp` function which is a C function (you can't even break out of it in RStudio). I could maybe use `lpSolveAPI` which is more robust, but I've tried using it before and it gives me a headache haha. Don't want to waste my time using that if it won't work anyways. Thanks for the resource. – CoolGuyHasChillDay Oct 28 '18 at 04:01
  • You will probably gain some speed if you eliminate all the `%>%` inside `for-loop` https://stackoverflow.com/questions/38880352/should-i-avoid-programming-packages-with-pipe-operators – Tung Oct 28 '18 at 04:14
  • In your Setup step, you're growing objects within `for-loop` which is highly inefficient and not recommended in `R`. Please see these great posts to find out better ways to do it: [Efficient accumulation in R](http://www.win-vector.com/blog/2015/07/efficient-accumulation-in-r/), [Applying a function over rows of a data frame](https://rpubs.com/wch/200398) – Tung Oct 28 '18 at 04:17
  • Yep, I realize I lose performance the way I grow with a for-loop (along with other things in this simplified post), but it's not speed I'm looking for. I just want it to work. And the reason it's not working is due to the linear optimization being too large (I assume). I'm not sure if you ran the code, but if you did you'd see that it takes less than 5 seconds to get to the line that runs the `lp` function. And once it gets to a point where a solution does exist, it freezes R. – CoolGuyHasChillDay Oct 28 '18 at 05:13

0 Answers0