Python: Given a set of N elements, choose k at random, m times

Question

Given a set of N elements, I want to choose m random, non-repeating subsets of k elements.

If I was looking to generate all the N choose k combinations, I could have used itertools.combination, so one way to do what I m asking would be:

import numpy as np
import itertools
n=10
A = np.arange(n)
k=4
m=5
result = np.random.permutation([x for x in itertools.permutations(A,k)])[:m]
print(result)

The problem is of course that this code first generates all the possible permutations, and that this can be quite expensive.

Another suboptimal solution would be to choose each time a single permutation at random (e.g. choose-at-random-from-combinations, then sort to get permutation), and discard it if it has already been selected.

Is there a better way to do this?

Make a list of indices, pick one at random, remove from the list, repeat k times — , Feb 02 '18 at 10:19
@Tobias that would be exactly the same as picking random permutations, you would still have to check if you picked the same twice and discard it — Carlo, Feb 02 '18 at 10:21
https://stackoverflow.com/questions/15512058/python-shuffle-such-that-position-will-never-repeat — DhruvPathak, Feb 02 '18 at 10:33
Anyone else come here after using `itertools` to make an impossibly long list of combinations and crashing their Python kernel before they thought about what they were doing? — eric, Apr 12 '22 at 17:07

score 3 · Accepted Answer · answered Feb 02 '18 at 10:59

Your second solution seems to be the only practical way to do it. It will work well unless k is close to n and m is "large", in which case there will be more repetitions.

I added a count of the tries needed to get the samples we need. For m=50, with n=10 and k=4, it takes usually less than 60 tries. You can see how it goes with the size of your population and your samples.

You can use random.sample to get a list of k values without replacement, then sort it and turn it into a tuple. So, we can use a set for keeping only unique results.

import random

n = 10
A = list(range(n))
k = 4
m = 5

samples = set()
tries = 0
while len(samples) < m:
    samples.add(tuple(sorted(random.sample(A, k))))
    tries += 1

print(samples)
print(tries)

# {(1, 4, 5, 9), (0, 3, 6, 8), (0, 4, 7, 8), (3, 5, 7, 9), (1, 2, 3, 4)}
# 6
# 6 tries this time !

+1 for the code and the study. Quite smart using **samples=set()** etc.I m going with your code in my real life I think. — ntg, Feb 05 '18 at 09:42

AndreyS Scherbakov · Answer 2 · 2018-02-02T10:38:45.543

2

The simplest way to do it is to random.shuffle(range) then take first k elements (need to be repeated until m valid samples are collected).

Of course this procedure cannot guarantee unique samples. You are to check a new sample against your historical hash if you really need it.

Since Pyton2.3, random.sample(range, k) can be used to produce a sample in a more efficient way

edited Feb 02 '18 at 10:38

answered Feb 02 '18 at 10:26

AndreyS Scherbakov

2,674
2
20
27

1

Right. But dealing with all combinations is much more costly indeed. – AndreyS Scherbakov Feb 02 '18 at 10:41

Python: Given a set of N elements, choose k at random, m times

2 Answers2

Linked