3

I am trying to learn to fit a linear integer programming optimization model in R using the ompr package that a colleague had previously fit using CPLEX/GAMS (specifically, the one described here: Haight et al. 2021). I am running my implementation on a Linux Supercomputing server at my University that has 248gb of memory, which I'd think would be sufficient for the job.

Here is my code and output from the failure report from the server:

#Read in the necessary pre-generated data and packages

library(pacman); library(dplyr); library(ROI); library(ompr); library(ompr.roi)
n.ij = readRDS(file="nij1.rds") #An indexing vector.
B = 10 #Budget constraint--inspect only 10 lakes maximum

#Initialize model prior to setting the objective.
mod1 = MILPModel() %>% 
add_variable(u[i, j], type = "binary", i = 1:n.ij, j = 1:n.ij) %>%
add_variable(x[i], type = "binary", i = 1:n.ij) %>% 
add_variable(x[j], type = "binary", j = 1:n.ij) %>% 
add_constraint(x[i] + x[j] >= u[i,j], i = 1:n.ij, j = 1:n.ij) %>% 
add_constraint(sum_expr(x[i], i = 1:n.ij) <= B)
 
#Read in the relevant adjacency matrix of boat movements between every pair of lakes.
boats.n.ij = readRDS(file="boatsnij1.rds")

#Some system and object size info.
system(paste0("cat /proc/",Sys.getpid(),"/status | grep VmSize"))
VmSize: 13017708 kB
object.size(mod1)
6798778288 bytes

#Now, set objective with this specific boats.n.ij file.
mod1.full = mod1 %>% 
set_objective(sum_expr(u[i,j] * boats.n.ij[i, j], i = 1:n.ij, j = 1:n.ij))

Error in subCsp_ij(x, i, j, drop = drop) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89
Calls: %>% ... [ -> callGeneric -> eval -> eval -> [ -> [ -> subCsp_ij
Execution halted

For the purposes of creating a reproducible example, mock versions of n.ij and boats.n.ij can be generated as follows:

library(Matrix)

boats = rpois(7940*7940, 2)
keep = sample(c(0,1), 7940*7940, replace=T, prob = c(0.8, 0.2))
boat.dat = boats*keep

boats.n.ij = matrix(boat.dat, nrow=7940, ncol=7940)
diag(boats.n.ij) = 0
boats.n.ij = Matrix(boats.n.ij, sparse = T)

boats.n.ij[1:10, 1:10]

n.ij = 1:7940

Why am I failing to add the objective to my model? Is it just that I am implying the existence of three very large matrices (the decision matrix u, the boats.n.ij matrix, and their product matrix)? Is it because the model is already a file that is about 6.8gb? Is there a cap on memory or object size imposed by R I am running into? Are these functions just not capable of considering an objective with this many decision points?

I can confirm that I have been able to run a scaled-down version of the model on a very small subset of boats.n.ij that optimizes just fine, so I don't think it's an issue with my model specification, but I could be wrong...I should also state explicitly I am not interested in solutions that do not involve solving this model in R, as that is the express objective here. I am open, however, to using other packages if there's a more robust one available (although I like this one otherwise).

Note: Unlike in the paper I've cited, I have eliminated the need for a vector called b.ij that my colleague does use, so that isn't the issue here.

Edit: Note that @nicola's reforming of the objective will set and solve, but the original constraints and/or variables would no longer have the same relationship with it, so it'd be fitting a different model than the one I want to fit. In the original construction, only a max of 10 values in x[i], and thus a max of 10 unique values of i within the decision variable u[i,j], would be allowed to be 1s thanks to the constraint involving our budget parameter B. In @nicola's version, far more than 10 unique values of i are permitted to be 1s within u[i,j]. It's actually unclear to me at least how the constraints as originally written interact with @nicola's objective, if at all. However, I suspect an objective like @nicola's could definitely be used to exploit the sparsity of my boats.n.ij matrix so as to avoid the "problem too large" error, but it would require the variables and/constraints to be modified accordingly. I changed the title of the question to be much clearer as to what I am looking for--I want to avoid the error but otherwise fit an equivalent model.

Second edit: @nicola's solution does work after all! However, the variables and constraints needed a little modification due to updates to ompr since I posted this question. See the following toy example:

library(Matrix)
library(slam)
library(dplyr)
library(tidyr)
library(ROI)
library(ompr)
library(ompr.roi)
library(Rglpk)
library(ROI.plugin.glpk)
library(lattice)

set.seed(101)
N = 500
boats = rpois(N*N, 2)
keep = sample(c(0,1), N*N, replace=T, prob = c(0.97, 0.03))
boat.dat = boats*keep

boats.n.ij = Matrix(boat.dat, nrow=N, ncol=N, sparse =T)
diag(boats.n.ij) = 0

boats.n.ij[1:10, 1:10]

n.ij = N
B = 5

mod1 = MIPModel() %>% 
  add_variable(u[i, j], type = "binary", i = 1:n.ij, j = 1:n.ij) %>%
  add_variable(x[i], type = "binary", i = 1:n.ij) %>% 
  add_variable(y[j], type = "binary", j = 1:n.ij) %>% 
  add_constraint(x[i] == y[j], i = 1:n.ij, j = 1:n.ij, i == j) %>% 
  add_constraint(sum_over(x[i], i = 1:n.ij) <= B) %>% 
  add_constraint(u[i,j] <= x[i] + y[j], i = 1:n.ij, j = 1:n.ij)

boatsSTM = as.simple_triplet_matrix(boats.n.ij)

#setting the objective function
mod.2nd = mod1 %>% set_objective(sum_over(u[boatsSTM$i[k], boatsSTM$j[k]] * boatsSTM$v[k], k = 1:length(boatsSTM$i)))

mod.2nd.solved = mod.2nd %>% 
  solve_model(with_ROI("glpk", verbose=TRUE))


testB = get_solution(mod.2nd.solved, u[i,j])
test2B = pivot_wider(testB, names_from = j, values_from = value) %>% dplyr::select(-variable, -i)
test3B = as.matrix(test2B, nrow=100)

levelplot(test3B)
Bajcz
  • 433
  • 5
  • 20
  • 1
    Strongly related: [What does the the cholmod error 'Problem too large' mean?](https://stackoverflow.com/questions/58302449/what-does-the-cholmod-error-problem-too-large-means-exactly-problem-when-conv) – Gregor Thomas Dec 13 '21 at 17:49
  • @GregorThomas thanks for linking that post. I've looked at it but not sure what to do with it...does it suggest that R is handicapped when working with large matrices? Or does it suggest that some operations convert sparse matrices to dense ones and that this needs to be prevented in some circumstances somehow? – Bajcz Dec 13 '21 at 18:05
  • 2
    I'm posting it as a comment, not an answer, because I think it's a useful clue to consider. We can see from your error message that the code throwing the error is `cholmod_sparse.c`, which is part of base R, not the `ompr` package. I didn't dive in enough to know if the issue is the same--trying to change a sparse matrix to dense, but I wanted to leave it as a breadcrumb for anyone else looking at your question. – Gregor Thomas Dec 13 '21 at 18:14
  • What's the sparsity? – Kathy Godwin Dec 13 '21 at 20:49
  • Quite low...Maybe 1-5% of the table has positive values. However, there is no column or row that has no positive values in it--I've already "pruned" as many dimensions at I could in that sense. – Bajcz Dec 13 '21 at 22:02
  • Another breadcrumb, to borrow @GregorThomas's terminology! https://stackoverflow.com/questions/1395229/increasing-or-decreasing-the-memory-available-to-r-processes. Looks like the `unix` package can be used to increase R's limits if the OS allows it. But I'm not quite sure how to take advantage...or if that's the problem here. – Bajcz Dec 14 '21 at 16:56
  • Is `boats.n.ij` a dense or a `SparseMatrix`? If it's sparse, you don't need to run on every `(i,j)`, but just on an index of not-zero elements. – nicola Dec 16 '21 at 04:56
  • It is a Sparse matrix as constructed by the `Matrix` package, although I know that `slam` matrices have some advantages... – Bajcz Dec 17 '21 at 16:47

1 Answers1

1

An attempt:

require(slam)
boatsSTM<-as.simple_triplet_matrix(boats.n.ij)
...

#setting the objective function
set_objective(sum_expr(u[boatsSTM$i[k], boatsSTM$j[k]] * boatsSTM$v[k], k = 1:length(boatsSTM$i)))

We exploit the sparsity of the matrix. In a simple triplet matrix you just list the values that are not zero, implying that if an element is not listed is equal to zero. The values are denoted with a (i, j, v) triplet, where i denotes the row index, j the column index and v the value. So, for instance, a (2, 4, 10.32) triplet denotes that m[2, 4] = 10.32.

In your sum_expr line, we exploit this and just add the elements that are different from zero. We don't multiply each element of u with each element of boats, since most are zero and irrelevant to the sum; rather we just do the above for the elements that matter.

The slam package implements the simple triplet matrix which is, at its root, just a list of i, j and v values.

nicola
  • 24,005
  • 3
  • 35
  • 56
  • Thanks! I was wondering if it might be possible to force these functions to only work with sparse matrices. I will try this now and see if it works. – Bajcz Dec 17 '21 at 16:32
  • It did work! And it fit in 2 seconds, astonishingly! Can you explain this answer more so that I can fully follow what you did and how these simple triplet matrices work and how you can use these embedded vectors `i`, `j`, and `v` as you are here (and where they come from)? This is pretty outside of my experience base...I will give the bounty but would appreciate a follow-up if it's not too much to ask! – Bajcz Dec 20 '21 at 14:17
  • 1
    @Bajcz I expanded the answer a little. – nicola Dec 20 '21 at 15:28
  • Perfect, this is exactly what I was looking for. Cheers. – Bajcz Dec 20 '21 at 18:24
  • Sorry--I hadn't done my due diligence here. This solution works for getting the objective to set and the model to solve, but it doesn't preserve the original intents. The budget constraint, `B`, no longer applies as it should because values in boatsSTM$i are not unique...any ideas how to reframe the variables/constraints? – Bajcz Jan 20 '22 at 16:06
  • So you remove the "accept" because you have another issue in another part? Is this what you are saying? Please clarify. – nicola Jan 20 '22 at 16:48
  • Correct--I think it is bad to allow the "accept" if the solution only allows the objective to be set but not maintain congruence with the rest of the problem (the objective sets and solves, as desired, but provides a different solution than the one intended because it restructures the data but not the constraints and variables). I am sure the variables and/or constraints can be reformed to work with the new objective but I'm too unfamiliar with STMs to see how at the moment...would give the check mark back for this! :) – Bajcz Jan 20 '22 at 17:10
  • You asked what caused the original error and that is fixed (look at the title of your question). You don't even provide any evidence on what is "restructured" and that wasn't even in your question, neither any hint that it's due to the line I suggested. If you diagnosed something that went wrong, just build a reproducible example and ask another question. As it is, it's totally impossible for anybody to do something that might help you. – nicola Jan 20 '22 at 17:34
  • Sorry, I see your point--I guess I thought it went without saying that I wanted to avoid the error *and in doing so fit the equivalent model.* Your line doesn't do that, unfortunately. That desire might make this more a statistical/modeling question than a programming one, especially since you have indeed found a potential programming solution around the error, although I am unsure if there is a way to still solve the same model using your new objective. I've edited the question and title accordingly, but I may need to just delete/close this post and repost on CrossValidated or elsewhere. – Bajcz Jan 20 '22 at 18:29
  • Ok, I see your point, but I don't think it's totally fair to ask a question, get an answer that fits the question and only then "disaccept" the answer because you didn't ask the question you had in mind. Plus, it's still not clear what doesn't work. In my shoes, what do you think I could do to solve this new question if you didn't even bother to provide some detail? Where are the evidence that " the original constraints and/or variables would no longer have the same relationship with it" and that is due to that line (this is very unlikely IMO)? – nicola Jan 21 '22 at 08:07
  • Since my solution just involves setting the objective function, I'm pretty sure you have bug somewhere else. It's not possible that the line changes something about the constraints. – nicola Jan 21 '22 at 11:58
  • It doesn't change the constraints, and that's the issue--the model is a function of the constraints, variables, and objectives. If you change one without changing the others, you are essentially changing the model--the same constraints placed on a new objective will yield a different solution. – Bajcz Jan 21 '22 at 14:40
  • I think it's fair though that I provide a toy example to demonstrate this--I will work on this and add it to the question. – Bajcz Jan 21 '22 at 14:52
  • I really don't understand the problem. My line just replaces the `set_objective`. Of course the constraints need to be set in both cases. Please clarify. – nicola Jan 21 '22 at 14:52
  • My bad. All is well. There were just some updates to the `ompr` package since I posted this post that required some tweaks and there was a bug in my plotting code that made the solution look incorrect that I failed to catch also. Check mark restored and with my sincerest apologies. – Bajcz Jan 21 '22 at 19:44