3

I need to select random combinations of k elements from a set of n elements, where n can be fairly large. Given the size of the set, it is not feasible to simply use combnk or nchoosek to generate all possible combinations, and select randomly from those.

Is there an easy way to generate a unique random subset of M of those combinations?

When n is small, the following works:

M = 20; %want to pick M random combinations
n = 10; %number of elements 
k = 5;  %number of elements in each combination
allCombos = nchoosek([1:n], k);  %for large n this is not feasible
numCombos = nchoosek(n,k);
permutationsToUse = randperm(numCombos, M);
randomCombos = allCombos(permutationsToUse, :);

When n is large, this is no longer feasible.

Related Posts

Retrieve a specific permutation without storing all possible permutations in Matlab

How to randomly pick a number of combinations from all the combinations efficiently?

Select a subset of combinations

Community
  • 1
  • 1
Alessandro Cosentino
  • 2,268
  • 1
  • 21
  • 30
  • 2
    possible duplicate of [How do I randomly select k points from N points in MATLAB?](http://stackoverflow.com/questions/1856141/how-do-i-randomly-select-k-points-from-n-points-in-matlab) – Jonas Nov 16 '11 at 19:45
  • @Jonas they are not the same, that is an easier problem. The point of this question is that you don't have the N data points available, so how do you select randomly from a set of combinations that would be too large to simply enumerate (using `nchoosek`) and sample? – eric Mar 28 '15 at 16:18
  • @neuronet: The linked question answers exactly how you choose a subset of points without enumerating. – Jonas Mar 29 '15 at 16:17
  • @Jonas if it is exactly the answer I would invite you to answer this with it. I think it is different. I think you are missing some wrinkles that are present in this case, not that case. – eric Mar 30 '15 at 23:53

1 Answers1

2

You can try using randi and generate random combinations of 7 integers from 1 to Nelements and then check that you only have unique combinations:

Nelements=100;
M=10;
combsubset=randi(Nelements,[M 7]);
combsubset=unique(combsubset,'rows');

If you want to get exactly M combinations you can use a loop:

Nelements=100;
M=10;
combsubset=[];
while(size(combsubset,1)<M)
    combsubset=[combsubset;randi(Nelements,[M 7])];
    combsubset=unique(combsubset,'rows');
end
combsusbet=combsubset(1:M,:);

If you want to reuse this to get other combinations you can pretty much use the same code:

Nelements=100;
Mtotal=20
M=10;
while(size(combsubset,1)<Mtotal)
    combsubset=[combsubset;randi(Nelements,[M 7])];
    combsubset=unique(combsubset,'rows');
end
combsusbet=combsubset(1:Mtotal,:);

EDIT: Another method for your needs would be to order the combinations to be able to get only a given subset. One method to order them can be explained with the following example: if you have three indices i,j,k ranging from 0 to N-1 you can use a unique index n=i*N*N+j*N+k to go over all the possibilities. Then if you want to get the nth vector:

k=mod(n,N);
j=mod((n-k)/N,N);
i=mod((((n-k)/N)-j)/N,N);

I do not know if you will find this more elegant but with the help of a little function that uses recursion you could easily get a fixed subset of your combinations.

Aabaz
  • 3,106
  • 2
  • 21
  • 26
  • 1
    thanks for the answer. if they are not unique, I will have to generate more combinations. this doesn't seem very elegant. Also, I should have specified in the question that I further want that the next time I need combinations from the same set, the old ones should not appear again. – Alessandro Cosentino Nov 16 '11 at 20:08
  • If you need to generate more combinations then I guess you should add the newly generated subset to the old one and recheck for uniqueness (`combsubset=[combsubset;randi(Nelements,[M 7])];combsubset=unique(combsubset,'rows');`). Maybe it is not very elegant but I do not see how you can skip this test. If you are generating random combinations you are bound to encounter duplicates, especially if you keep generating more from the same set. – Aabaz Nov 16 '11 at 21:32
  • 1
    I was looking for something like `combnk`, but that lets you choose how many combinations you want. Something like `combnk(..., 10:30)`, where the last input means: give me only the 20 combinations between the 10th and the 30th that you would get if you generated all of them. – Alessandro Cosentino Nov 16 '11 at 21:40
  • @Aabaz this solution might work, but I agree with Alessandro there must be a better, more elegant solution. E.g., some function `returnCombo(n)` that returns combination *n*, without having to calculate all possible, or sort through duplicates (see first related post above). This business of going through and manually removing repeats suggests we haven't found the best answer yet. – eric Mar 30 '15 at 18:31