7

Given a set of N elements, I want to choose m random, non-repeating subsets of k elements.

If I was looking to generate all the N choose k combinations, I could have used itertools.combination, so one way to do what I m asking would be:

import numpy as np
import itertools
n=10
A = np.arange(n)
k=4
m=5
result = np.random.permutation([x for x in itertools.permutations(A,k)])[:m]
print(result)

The problem is of course that this code first generates all the possible permutations, and that this can be quite expensive.

Another suboptimal solution would be to choose each time a single permutation at random (e.g. choose-at-random-from-combinations, then sort to get permutation), and discard it if it has already been selected.

Is there a better way to do this?

ntg
  • 12,950
  • 7
  • 74
  • 95
  • Make a list of indices, pick one at random, remove from the list, repeat k times –  Feb 02 '18 at 10:19
  • 1
    @Tobias that would be exactly the same as picking random permutations, you would still have to check if you picked the same twice and discard it – Carlo Feb 02 '18 at 10:21
  • https://stackoverflow.com/questions/15512058/python-shuffle-such-that-position-will-never-repeat – DhruvPathak Feb 02 '18 at 10:33
  • 1
    Anyone else come here after using `itertools` to make an impossibly long list of combinations and crashing their Python kernel before they thought about what they were doing? – eric Apr 12 '22 at 17:07

2 Answers2

3

Your second solution seems to be the only practical way to do it. It will work well unless k is close to n and m is "large", in which case there will be more repetitions.

I added a count of the tries needed to get the samples we need. For m=50, with n=10 and k=4, it takes usually less than 60 tries. You can see how it goes with the size of your population and your samples.

You can use random.sample to get a list of k values without replacement, then sort it and turn it into a tuple. So, we can use a set for keeping only unique results.

import random

n = 10
A = list(range(n))
k = 4
m = 5

samples = set()
tries = 0
while len(samples) < m:
    samples.add(tuple(sorted(random.sample(A, k))))
    tries += 1

print(samples)
print(tries)

# {(1, 4, 5, 9), (0, 3, 6, 8), (0, 4, 7, 8), (3, 5, 7, 9), (1, 2, 3, 4)}
# 6
# 6 tries this time !
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
  • +1 for the code and the study. Quite smart using **samples=set()** etc.I m going with your code in my real life I think. – ntg Feb 05 '18 at 09:42
2

The simplest way to do it is to random.shuffle(range) then take first k elements (need to be repeated until m valid samples are collected).

Of course this procedure cannot guarantee unique samples. You are to check a new sample against your historical hash if you really need it.

Since Pyton2.3, random.sample(range, k) can be used to produce a sample in a more efficient way

AndreyS Scherbakov
  • 2,674
  • 2
  • 20
  • 27