59

This question pops up quite often in one form or another (see for example here or here). So I thought I'd present it in a general form, and provide an answer which might serve for future reference.

Given an arbitrary number n of vectors of possibly different sizes, generate an n-column matrix whose rows describe all combinations of elements taken from those vectors (Cartesian product) .

For example,

vectors = { [1 2], [3 6 9], [10 20] }

should give

combs = [ 1     3    10
          1     3    20
          1     6    10
          1     6    20
          1     9    10
          1     9    20
          2     3    10
          2     3    20
          2     6    10
          2     6    20
          2     9    10
          2     9    20 ]
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Hey @bla, you are getting rid of quite some rep lately! :-) – Luis Mendo Apr 25 '15 at 14:40
  • 4
    I've decided to start a SO version of "The Giving Pledge", i.e. 90% of my rep is going back to the contributors , 2-3K is enough for me... – bla Apr 26 '15 at 05:17
  • 1
    Wow! That's quite a lot of rep. But consider this: you deserve that rep as much as the other contributors. If those contributors apply that criterion, everything will be redistributed, and reditributed again, to end up more or less as in the beginning :-) – Luis Mendo Apr 26 '15 at 15:02
  • 1
    @bla Anyway, I feel very honored by the bounty. Thanks! – Luis Mendo Apr 26 '15 at 15:02

4 Answers4

52

The ndgrid function almost gives the answer, but has one caveat: n output variables must be explicitly defined to call it. Since n is arbitrary, the best way is to use a comma-separated list (generated from a cell array with ncells) to serve as output. The resulting n matrices are then concatenated into the desired n-column matrix:

vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors

n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order 
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 2
    This is a nice trick indeed. It is a useful way to [generalize the Cartesian product to N dimensions](http://stackoverflow.com/a/19991829/2778484). The `cat(n+1,...)` part is particularly clever. ;) – chappjc Feb 20 '14 at 00:10
  • @chappjc Thanks! [In the past](http://stackoverflow.com/a/21867835/2586922) I used a call to `cellfun` to linearize the n-dim arrays prior to concatenating them, but yes, I like this better – Luis Mendo Feb 20 '14 at 00:13
  • I'm accepting my own answer because, according to my [benchmarking](http://stackoverflow.com/a/24441502/2586922), it turns out to be faster. Thanks @horchler for your answer too! – Luis Mendo Jun 26 '14 at 23:10
  • 2
    Isn't this what [`allcomb`](http://www.mathworks.com/matlabcentral/fileexchange/10064-allcomb) function does on MATLAB file exchange? (just asking). – Autonomous Oct 23 '14 at 22:24
  • @ParagS.Chandakkar Yes, I think it does the same (I have never used that function) – Luis Mendo Oct 23 '14 at 22:32
  • 2
    I have just done some tests with `allcomb`. I confirm it produces the same result as my answer, and in the same order. As for performance, `allcomb` seems to take only slightly more time than my solution does @ParagS.Chandakkar – Luis Mendo Nov 04 '14 at 13:18
  • Very useful. I have this as a function which can also return index combinations if given a vector of the numbers of elements instead of a cell array of the vectors themselves, like this: `if ~iscell(vectors) vectors = arrayfun(@(n) {1:n}, vectors); end` – buzjwa Aug 30 '16 at 08:19
27

A little bit simpler ... if you have the Neural Network toolbox you can simply use combvec:

vectors = {[1 2], [3 6 9], [10 20]};
combs = combvec(vectors{:}).' % Use cells as arguments

which returns a matrix in a slightly different order:

combs =

     1     3    10
     2     3    10
     1     6    10
     2     6    10
     1     9    10
     2     9    10
     1     3    20
     2     3    20
     1     6    20
     2     6    20
     1     9    20
     2     9    20

If you want the matrix that is in the question, you can use sortrows:

combs = sortrows(combvec(vectors{:}).')
% Or equivalently as per @LuisMendo in the comments: 
% combs = fliplr(combvec(vectors{end:-1:1}).') 

which gives

combs =

     1     3    10
     1     3    20
     1     6    10
     1     6    20
     1     9    10
     1     9    20
     2     3    10
     2     3    20
     2     6    10
     2     6    20
     2     9    10
     2     9    20

If you look at the internals of combvec (type edit combvec in the command window), you'll see that it uses different code than @LuisMendo's answer. I can't say which is more efficient overall.

If you happen to have a matrix whose rows are akin to the earlier cell array you can use:

vectors = [1 2;3 6;10 20];
vectors = num2cell(vectors,2);
combs = sortrows(combvec(vectors{:}).')
horchler
  • 18,384
  • 4
  • 37
  • 73
  • Good suggestion. I don't have it presently, but it's good to know about. – chappjc Feb 20 '14 at 00:25
  • 2
    +1 I didn't know about that function. Pity I don't have that toolbox. Maybe instead of using `sortrows` you could save time with `combs = fliplr(combvec(vectors{end:-1:1}).')` ? – Luis Mendo Feb 20 '14 at 00:29
  • @horchler I have used this quite a few times, so felt obliged for +1 :) – Divakar Apr 21 '14 at 18:29
13

I've done some benchmarking on the two proposed solutions. The benchmarking code is based on the timeit function, and is included at the end of this post.

I consider two cases: three vectors of size n, and three vectors of sizes n/10, n and n*10 respectively (both cases give the same number of combinations). n is varied up to a maximum of 240 (I choose this value to avoid the use of virtual memory in my laptop computer).

The results are given in the following figure. The ndgrid-based solution is seen to consistently take less time than combvec. It's also interesting to note that the time taken by combvec varies a little less regularly in the different-size case.

enter image description here


Benchmarking code

Function for ndgrid-based solution:

function combs = f1(vectors)
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);

Function for combvec solution:

function combs = f2(vectors)
combs = combvec(vectors{:}).';

Script to measure time by calling timeit on these functions:

nn = 20:20:240;
t1 = [];
t2 = [];
for n = nn;
    %//vectors = {1:n, 1:n, 1:n};
    vectors = {1:n/10, 1:n, 1:n*10};
    t = timeit(@() f1(vectors));
    t1 = [t1; t];
    t = timeit(@() f2(vectors));
    t2 = [t2; t];
end
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 2
    I never worked with matlab, so I don't know, whether my solution in Java https://stackoverflow.com/a/10083452/312172 is adoptable to mathlab. It works without generating the cartesian product, but calculating for each given index the combination of elements, given at that index. So it can be used where speed and memory usage has to be considered. It can be adopted to long or BigInteger, well, at least for long, I should have done that. A single access is always a bit time consuming, but for random access in the range of billions, it still should work in constant time. Maybe you're interested. – user unknown Feb 10 '18 at 11:57
2

Here's a do-it-yourself method that made me giggle with delight, using nchoosek, although it's not better than @Luis Mendo's accepted solution.

For the example given, after 1,000 runs this solution took my machine on average 0.00065935 s, versus the accepted solution 0.00012877 s. For larger vectors, following @Luis Mendo's benchmarking post, this solution is consistently slower than the accepted answer. Nevertheless, I decided to post it in hopes that maybe you'll find something useful about it:

Code:

tic;
v = {[1 2], [3 6 9], [10 20]};

L = [0 cumsum(cellfun(@length,v))];
V = cell2mat(v);

J = nchoosek(1:L(end),length(v));
J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
  any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:)  = [];

V(J)
toc

gives

ans =

 1     3    10
 1     3    20
 1     6    10
 1     6    20
 1     9    10
 1     9    20
 2     3    10
 2     3    20
 2     6    10
 2     6    20
 2     9    10
 2     9    20

Elapsed time is 0.018434 seconds.

Explanation:

L gets the lengths of each vector using cellfun. Although cellfun is basically a loop, it's efficient here considering your number of vectors will have to be relatively low for this problem to even be practical.

V concatenates all the vectors for easy access later (this assumes you entered all your vectors as rows. v' would work for column vectors.)

nchoosek gets all the ways to pick n=length(v) elements from the total number of elements L(end). There will be more combinations here than what we need.

J =

 1     2     3
 1     2     4
 1     2     5
 1     2     6
 1     2     7
 1     3     4
 1     3     5
 1     3     6
 1     3     7
 1     4     5
 1     4     6
 1     4     7
 1     5     6
 1     5     7
 1     6     7
 2     3     4
 2     3     5
 2     3     6
 2     3     7
 2     4     5
 2     4     6
 2     4     7
 2     5     6
 2     5     7
 2     6     7
 3     4     5
 3     4     6
 3     4     7
 3     5     6
 3     5     7
 3     6     7
 4     5     6
 4     5     7
 4     6     7
 5     6     7

Since there are only two elements in v(1), we need to throw out any rows where J(:,1)>2. Similarly, where J(:,2)<3, J(:,2)>5, etc... Using L and repmat we can determine whether each element of J is in its appropriate range, and then use any to discard rows that have any bad element.

Finally, these aren't the actual values from v, just indices. V(J) will return the desired matrix.

Geoff
  • 1,202
  • 10
  • 18
  • It's good to have other ways to approach the problem! Just my usual comment: `'` is not transpose; `.'` is – Luis Mendo Sep 19 '15 at 09:49
  • Thanks Luis! You're right, but after second thought it is do-able with either v' or v.' since the shape of V ends up being irrelevant. – Geoff Sep 19 '15 at 14:42
  • 1
    @Luis Mendo doh! I just stumbled upon [this post](http://stackoverflow.com/questions/25150027/using-transpose-versus-ctranspose-in-matlab) ... thanks for forgiving my ignorance :) – Geoff Dec 16 '15 at 22:14
  • Nice Q&A you found! :-D – Luis Mendo Dec 17 '15 at 00:01