3

I am trying to find solutions to a matrix where I know the row and column sums and the maximum value a cell can have. I want to find possible solutions that are within the constraints. I've already tried various things like constructing an array of all cell values and picking picking from each cell in sequence but whatever I try I always run into the problem where I run out of values for a cell. I also tried a recursive algorithm but that I only managed to get the first result or it failed to get any solution. I think I have to do this with a backtracking algorithm? Not sure...

Any help or pointers would be appreciated.

Row sums A, B, C, column sums X, Y, Z as well as the maximum value for each ? are known. All values are are positive integers.

    C1 | C2 | C3 
-----------------
R1 | ? | ?  | ? | A
-----------------
R2 | ? | ?  | ? | B
-----------------
R3 | ? | ?  | ? | C
-----------------
     X | Y | Z
Splatbang
  • 756
  • 1
  • 12
  • 29
  • 1
    should the solutions be integer or real ? – m.raynal Mar 21 '19 at 10:12
  • i'm only looking for integers – Splatbang Mar 21 '19 at 11:38
  • If the integers can be negative, then if an instance has any solution it has an infinite number of solutions: Pick any 2 rows a, b and any 2 columns c, d, and any integer x, and add x to positions (a, c) and (b, d), and subtract x from positions (a, d) and (b, c), and the resulting row and column sums are unchanged. – j_random_hacker Mar 21 '19 at 12:50
  • 1
    sorry, should have mentioned: only positive integer values – Splatbang Mar 21 '19 at 13:07

2 Answers2

5

If you heard about linear programming (LP) and its 'cousins' (ILP, MILP), that could be a good approach to help you solve your problem with a great efficiency.
A linear program consists in a set of variables (your matrix unknowns), constraints (maximum values, sum of rows and columns), and an objective function (here none) to minimize or maximize.

Let's call x[i][j] the values you are looking for. With the following data:

NxM the dimensions of your matrix
max_val[i][j] the maximum value for the variable x[i][j]
row_val[i] the sum of the values on the row i
col_val[j] the sum of the values on the column j

Then a possible linear program that could solve your problem is:

// declare variables
int x[N][M] // or eventually float x[N][M] 
// declare constaints
for all i in 1 .. N, j in 1 .. M, x[i][j] <= max_val[i][j]
for all i in 1 .. N, sum[j in 1 .. M](x[i][j]) == row_val[i]
for all j in 1 .. M, sum[i in 1 .. N](x[i][j]) == col_val[j]
// here the objective function is useless, but you still will need one
// for instance, let's minimize the sum of all variables (which is constant, but as I said, the objective function does not have to be useful)
minimize sum[i in 1 .. N](sum[j in 1 .. M](x[i][j]))
// you could also be more explicit about the uselessness of the objective function
// minimize 0

Solvers such as gurobi or Cplex (but there are much more of them, see here for instance) can solve this kind of problems incredibly fast, especially if your solutions do not need to be integer, but can be float (that makes the problem much, much easier). It also have the advantage to not only be faster t execute, but faster and simpler to code. They have APIs in several common programming languages to ease their use.
For example, you can reasonably expect to solve this kind of problem in less than a minute, with hundreds of thousands of variables in the integer case, millions in the real variables case.

Edit: In response to the comment, here is a piece of code in OPL (the language Cplex and other LP solvers use) that would solve your problem. We consider a 3x3 case.

// declare your problem input
int row_val[1..3] = [7, 11, 8];
int col_val[1..3] = [14, 6, 6];
int max_val[1..3][1..3] = [[10, 10, 10], [10, 10, 10], [10, 10, 10]];

// declare your decision variables
dvar int x[1..3][1..3];

// objective function
minimize 0;

// constraints
subject to {
    forall(i in 1..3, j in 1..3) x[i][j] <= max_val[i][j];
    forall(i in 1..3) sum(j in 1..3) x[i][j] == row_val[i];
    forall(j in 1..3) sum(i in 1..3) x[i][j] == col_val[j];
}

The concept of a LP solver is that you only describe the problem you want to solve, then the solver solves it for you. The problem must be described according to a certain set of rules. In the current case (Integer Linear Programming, or ILP), the variables must all be integers, and the constraints and objective function must be linear equalities (or inequalities) with regards to the decision variables.
The solver will then work as a black box. It will analyse the problem, and run algorithms that can solve it, with a ton of optimizations, and output the solution.

m.raynal
  • 2,983
  • 2
  • 21
  • 34
  • Thanks for the response. I'm not sure if I understand the code correctly. The first step is to declare my constraints, by going through all rows/columns in my matrix and filling the max_val array. In the next step I go through all elements in i of the matrix and build a sum of all elements j? Or is sum another array? – Splatbang Mar 21 '19 at 11:43
  • oh, ok, so this describes the problem for the solver and the solver then comes up with solutions to the problem. I'd have liked to come up with a 'solver' in php or js or whatever myself but this looks like a harder problem that I thought it was... – Splatbang Mar 21 '19 at 12:49
4

As you wrote in a comment, that you want to come up an own solution, here's some guideline:

Use a Backtrack algorithm to find a solution. Your value-space consists of 3*3=9 independent values, each of them are between 1 and maxval[i][j]. Your constraints will be the row and column sums (all of them must match)

Intitalize your space with all 1s, then increment them, until they reach the maxval. Evaluate the conditions only after each value is covered for that condition (particularly, after 3 values you can evaluate the first row, after 6 the second row, after 7 the first col, after 8 the second col, and after 9 the third row and the third col)

If you reach the 9th, with all conditions passing, you've got a solution. Otherwise try the values from 1 till maxval, if neither matches, step back. If the first value was iterated through, then there's no solution.

That's all.


More advanced backtracking:

Your moving values are only the top-left 2*2=4 values. The third column is calculated, the condition is that it must be between 1 and the maxval for that particular element. After defining the [1][1] element, you need to calculate the [2][2] index by using the column sum, and validate its value by the row sum (or vica versa). The same processing rules apply as above: iterate through all possible values, step back if none matches, and check rules only if they can be applied.

It is a way faster method, since you have 5 bound variables (the bottom and right rows), and only 4 unbound. These are optimizations from your particular rules. A bit more complex to implement, though.

PS: 1 is used because you have positive integers. If you have non-negative integers, you need to start with 0.

gaborsch
  • 15,408
  • 6
  • 37
  • 48
  • Specially, if the sum of the row sums does not match the sum of the col sums, then there's no solution. You can check it in advance. If that condition passed, you need to check either of those conditions while iterating in the backtrack. This applies for both variants I wrote above. – gaborsch Mar 23 '19 at 09:06
  • @Permian Which backtracking: solution 1, solution 2 or backtracking in general? – gaborsch Oct 03 '20 at 15:13
  • solution 1 usually you would set up two for loops but this wont work here – Trajan Oct 03 '20 at 15:36
  • @Permian Backtrack is one while loop. In this particular example you could use 9 for loops instead of the backtrack – gaborsch Oct 03 '20 at 15:51