0

You can easily generate all possible combinations of the elements of a list using itertools.combinations. I am interested in then sampling only a handful of these as efficiently as possible. In some cases, this will mean having millions of combinations and only needing a handful of randomly selected ones.

import itertools

combinations = itertools.combinations(range(1, 30), 10)

I would like to sample e.g. 4 combinations out of all combinations stored in combinations.

EDIT: previous results can be found here but do not strike me as very efficient.

Mark Verhagen
  • 239
  • 1
  • 11
  • thanks @Chris it doesn't strike me as overly efficient – Mark Verhagen Jul 29 '20 at 14:01
  • 1
    You might `random.sample` indices, and then loop over the combinations **without creating an actual list** and choose the elements matching the indices. – Tomerikoo Jul 29 '20 at 14:09
  • I would consider reopening this question. It's not the same as the linked question. The OP wants **more than one** random combination, while the linked question is about **a single** random combination. – ChaimG Mar 07 '22 at 06:33

1 Answers1

2

Just picking 10 samples each time from the list would be most efficient way

>>> import random
>>> l = range(1, 30)
>>> 
>>> res = [random.sample(l, 10) for _ in range(4)]
>>> pprint(res)
[[23, 25, 27, 9, 8, 19, 3, 16, 26, 7],
 [6, 16, 5, 8, 22, 20, 15, 10, 12, 13],
 [28, 20, 3, 21, 29, 12, 7, 23, 2, 10],
 [18, 13, 29, 23, 19, 10, 27, 7, 17, 20]]
Prem Anand
  • 2,469
  • 16
  • 16
  • thanks, the point is that the list is extremely large. I'd rather not transform the iterable into a list since this would be highly inefficient. – Mark Verhagen Jul 29 '20 at 13:59
  • Ah I see, you generate the combinations directly but this does not guarantee uniqueness of the resulting combinations, correct? – Mark Verhagen Jul 29 '20 at 14:02
  • I have removed the unnecessary conversion to list – Prem Anand Jul 29 '20 at 14:03
  • @MarkVerhagen `random.sample` would always return unique random elements – Prem Anand Jul 29 '20 at 14:05
  • @PremAnand he means that 2 different calls to `sample` *might* return the exact same combination... – Tomerikoo Jul 29 '20 at 14:06
  • 1
    I get it now. But as we are generating a small list of samples, we can check if a newly generated sample matches an already generated one and regenerate if needed – Prem Anand Jul 29 '20 at 14:09
  • yes, I agree, but I am still interested primarily in efficiently sampling an iterable, ideally without having to evaluate duplicates and possibly re-sample – Mark Verhagen Jul 29 '20 at 14:12
  • 1
    @MarkVerhagen Sounds like [reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling) then. – superb rain Jul 29 '20 at 14:18