2

I'm trying to construct a matrix in Matlab where the sum over the rows is constant, but every combination is taken into account.

For example, take a NxM matrix where M is a fixed number and N will depend on K, the result to which all rows must sum. For example, say K = 3 and M = 3, this will then give the matrix:

[1,1,1
2,1,0
2,0,1
1,2,0
1,0,2
0,2,1
0,1,2
3,0,0
0,3,0
0,0,3]

At the moment I do this by first creating the matrix of all possible combinations, without regard for the sum (for example this also contains [2,2,1] and [3,3,3]) and then throw away the element for which the sum is unequal to K

However this is very memory inefficient (especially for larger K and M), but I couldn't think of a nice way to construct this matrix without first constructing the total matrix.

Is this possible in a nice way? Or should I use a whole bunch of for-loops?

DGIB
  • 313
  • 1
  • 3
  • 8
  • 1
    This is a modified version of the subset sum problem: https://en.wikipedia.org/wiki/Subset_sum_problem - Essentially, you are trying to determine all combinations of numbers from a given subset of integers that give a certain sum. This is a [NP Complete](https://en.wikipedia.org/wiki/NP-completeness) problem and there is no efficient way to do this for large values of `K` and `M` unless you loop over all possible combinations. BTW, if you do ever figure an efficient algorithm, let me know. We can [share a million dollars together](http://www.claymath.org/millennium-problems/p-vs-np-problem). – rayryeng Dec 09 '15 at 00:25

2 Answers2

3

Here is a very simple version using dynamic programming. The basic idea of dynamic programming is to build up a data structure (here S) which holds the intermediate results for smaller instances of the same problem.

M=3;
K=3;
%S(k+1,m) will hold the intermediate result for k and m
S=cell(K+1,M);
%Initialisation, for M=1 there is only a trivial solution using one number.
S(:,1)=num2cell(0:K);
for iM=2:M
    for temporary_k=0:K
        for new_element=0:temporary_k
            h=S{temporary_k-new_element+1,iM-1};
            h(:,end+1)=new_element;
            S{temporary_k+1,iM}=[S{temporary_k+1,iM};h];
        end        
    end
end
final_result=S{K+1,M}
Daniel
  • 36,610
  • 3
  • 36
  • 69
  • Does the number of required loops depend on `M`? Or is the code general? – Luis Mendo Dec 09 '15 at 00:40
  • It's to late, my previous comment did not answer your question. No the number of loops does not depend on M. It's always 3 loops. Outer one for increasing problems size, starting with 1 elements and ending with the indented number of elements. The inner two loops recombine the solutions for smaller instances of the problem to new ones. – Daniel Dec 09 '15 at 00:59
  • 2
    @DGIB: Reading the answer again I realized I wasted a lot of performance increasing the solution size (outer loop) one by one. Instead doubling the solution size in each step would make a much faster solution, but it's much more difficult to code. If you are working with large M (I would guess above 40) it might be worth to improve the code. – Daniel Dec 09 '15 at 12:27
1

This may be more efficient than your original approach, although it still generates (and then discards) more rows than needed.

Let M denote the number of columns, and S the desired sum. The problem can be interpreted as partitioning an interval of length S into M subintervals with non-negative integer lengths.

The idea is to generate not the subinterval lengths, but the subinterval edges; and from those compute the subinterval lengths. This can be done in the following steps:

  1. The subinterval edges are M-1 integer values (not necessarily different) between 0 and S. These can be generated as a Cartesian product using for example this answer.

  2. Sort the interval edges, and remove duplicate sets of edges. This is why the algorithm is not totally efficient: it produces duplicates. But hopefully the number of discarded tentative solutions will be less than in your original approach, because this does take into account the fixed sum.

  3. Compute subinterval lengths from their edges. Each length is the difference between two consecutive edges, including a fixed initial edge at 0 and a final edge at S.

Code:

%// Data
S = 3; %// desired sum
M = 3; %// number of pieces

%// Step 1 (adapted from linked answer):
combs = cell(1,M-1);
[combs{end:-1:1}] = ndgrid(0:S);
combs = cat(M+1, combs{:});
combs = reshape(combs,[],M-1);

%// Step 2
combs = unique(sort(combs,2), 'rows');

%// Step 3
combs = [zeros(size(combs,1),1) combs repmat(S, size(combs,1),1)]
result = diff(combs,[],2);

The result is sorted in lexicographical order. In your example,

result =
     0     0     3
     0     1     2
     0     2     1
     0     3     0
     1     0     2
     1     1     1
     1     2     0
     2     0     1
     2     1     0
     3     0     0
Community
  • 1
  • 1
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147