MATLAB: efficient generation of a large integer matrix of multi-indices

Question

Let d and p be two integers. I need to generate a large matrix A of integers, having d columns and N=nchoosek(d+p,p) rows. Note that nchoosek(d+p,p) increases quickly with d and p, so it's very important that I can generate A quickly. The rows of A are all the multi-indices with components from 0 to p, such that the sum of the components is less than or equal to p. This means that, if d=3 and p=3, then A is an [N=nchoosek(3+3,3)=20x3] matrix with the following structure:

It is not indispensable to follow exactly the row ordering I used, although it would make my life easier (for those interested, it's called graded lexicographical ordering and it's described here: http://en.wikipedia.org/wiki/Monomial_order).

In case you are curious about the origin of this weird matrix, let me know!

Maybe [generate all possibilities (Cartesian product)](http://stackoverflow.com/q/21895335/2586922) and then keep only those that meet the sum criterion? — Luis Mendo, Feb 02 '15 at 11:16
As you say: The size of the matrix will grow pretty fast. Are you sure you need all the values simultaneously? — knedlsepp, Feb 02 '15 at 12:09
possible duplicate of [combinations totaling to sum](http://stackoverflow.com/questions/21500539/combinations-totaling-to-sum) — knedlsepp, Feb 02 '15 at 14:04
The possible duplicate provides a very clever approach that is way more efficient than all current solutions. It is not exactly the same, but you can get your matrix by concatenating multiple outputs of it. — knedlsepp, Feb 02 '15 at 14:27
@LuisMendo the number of rows for all possibilities grow exponentially with d, while nchoosek(d+p,p) grows polynomially, so creating the former and pruning is definitely not acceptable. — DeltaIV, Feb 02 '15 at 15:27
@knedlsepp, if I understand correctly your suggestion, I should use Mark Dickinson's answer in the other thread, for each i between 1 and p, and then concatenate the p matrices by row (together with the initial [0 0 0...0] row). On paper, this looks great because it doesn't involve generating even bigger matrices and then pruning. I'll try it out in MATLAB and let you know how it works. — DeltaIV, Feb 02 '15 at 15:37
@knedlsepp, Mark Dickinson's answer gives the rows in the opposite order to what I need. As I don't understand the code enough to change the ordering, I used `flipud` to solve, but if there's a way to get directly the ordering I need, without unnecessary flipping, please let me know. Also, how should I proceed now? I cannot choose Mark's as the answer to my question, since it's in another thread. — DeltaIV, Feb 02 '15 at 15:45
@DeltaIV: I flagged this post as a duplicate, so if the community considers it to be a duplicate this will be closed anyway. Otherwise I did reformulate my answer using *Mark*'s post, so you could also accept my answer if it solved your problem. — knedlsepp, Feb 02 '15 at 22:53
@knedlsepp it did solve my problem indeed! Thanks to you and all the others who helped. — DeltaIV, Feb 03 '15 at 10:44

score 2 · Accepted Answer · edited May 23 '17 at 12:15

Solution using `nchoosek` and `diff`

The following solution is based on this clever answer by Mark Dickinson.

function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
    degrees = (0:maxDegree).';
    return;
end
degrees = cell(maxDegree+1,1);
k = numVars;
for n = 0:maxDegree
    dividers = flipud(nchoosek(1:(n+k-1), k-1));
    degrees{n+1} = [dividers(:,1), diff(dividers,1,2), (n+k)-dividers(:,end)]-1;
end
degrees = cell2mat(degrees);

You can get your matrix by calling monomialDegrees(d,p).

Solution using `nchoosek` and `accumarray`/`histc`

This approach is based on the following idea: There is a bijection between all k-multicombinations and the matrix we are looking for. The multicombinations give the positions, where the entries should be added. For example the multicombination [1,1,1,1,3] will be mapped to [4,0,1], as there are four 1s, and one 3. This can be either converted using accumarray or histc. Here is the accumarray-approach:

function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
    degrees = (0:maxDegree).';
    return;
end
degrees = cell(maxDegree+1,1);
degrees{1} = zeros(1,numVars);
for n = 1:maxDegree
    pos = nmultichoosek(1:numVars, n);
    degrees{n+1} = accumarray([reshape((1:size(pos,1)).'*ones(1,n),[],1),pos(:)],1);
end
degrees = cell2mat(degrees);

And here the alternative using histc:

function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
    degrees = (0:maxDegree).';
    return;
end
degrees = cell(maxDegree+1,1);
degrees(1:2) = {zeros(1,numVars); eye(numVars);};
for n = 2:maxDegree
    pos = nmultichoosek(1:numVars, n);
    degrees{n+1} = histc(pos.',1:numVars).';
end
degrees = cell2mat(degrees(1:maxDegree+1));

Both use the following function to generate multicombinations:

function combs = nmultichoosek(values, k)
if numel(values)==1
    n = values;
    combs = nchoosek(n+k-1,k);
else
    n = numel(values);
    combs = bsxfun(@minus, nchoosek(1:n+k-1,k), 0:k-1);
    combs = reshape(values(combs),[],k);
end

Benchmarking:

Benchmarking the above codes yields that the diff-solution is faster if your numVars is low and maxDegree high. If numVars is higher than maxDegree, then the histc solution will be faster.

Old approach:

This is an alternative to Dennis' approach of dec2base, which has a limit on the maximum base. It is still a lot slower than the above solutions.

function degrees = monomialDegrees(numVars, maxDegree)
Cs = cell(1,numVars);
[Cs{:}] = ndgrid(0:maxDegree);
degrees = reshape(cat(maxDegree+1, Cs{:}),(maxDegree+1)^numVars,[]);
degrees = degrees(sum(degrees,2)<=maxDegree,:);

Wow I didn't see your successive edits, where you compared the `diff` solution to the other ones. It's nice to know that there are other solutions which may be faster in some cases, but the `diff` solution is already so much faster than any pruning-based solution, that I'm more than satisfied with using it in all cases. — DeltaIV, Mar 09 '15 at 18:33

score 1 · Answer 2 · answered Feb 02 '15 at 12:32

I would solve it this way:

ncols=d;
colsum=p;
base=(0:colsum)';
v=@(dm)permute(base,[dm:-1:1]);

M=bsxfun(@plus,base,v(2));
for idx=3:ncols
    M=bsxfun(@plus,M,v(idx));
end
L=M<=colsum;
A=cell(1,ncols);
[A{:}]=ind2sub(size(L),find(L));
a=cell2mat(A);
%subtract 1 because 1 based indexing but base starts at 0
a=a-1+min(base);

It builds up a p-dimensional matrix which contains the sum. The efficiency of this code depends on sum(L(:))/numel(L), this quotient tells you how much of the created matrix is actually used for solutions. If this gets low for your intput, there probably exits a better solution.

score 1 · Answer 3 · answered Feb 02 '15 at 13:43

Here is a very easy way to do it:

L = dec2base(0:4^3-1,4);
idx=sum(num2str(L)-'0',2)<=3;
L(idx,:)

I think the first line can be very time efficient for creating a list of candidates, but unfortunately I don't know how to reduce the list in an efficient way after that.

So the second line works, but could use improvement performance wise.

MATLAB: efficient generation of a large integer matrix of multi-indices

3 Answers3

Solution using nchoosek and diff

Solution using nchoosek and accumarray/histc

Benchmarking:

Old approach:

Solution using `nchoosek` and `diff`

Solution using `nchoosek` and `accumarray`/`histc`