3

I need to be able to display a set of 1200 random combinations out of 36 nCr 10. Since there are 254,186,856 combinations from 36 nCr 10, I guess I won't be able to put all of those in a Python list.

How can I solve this issue? Should I use something other than Python, or look for a different algorithm? (I'm using this one right now: http://docs.python.org/library/itertools.html?highlight=itertools.combinations#itertools.combinations)

EDIT: The combinations must not be duplicates, as it wouldn't be an nCr problem anymore. Just thought I'd clarify that.

Here's the code so far...

def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
    return
indices = range(r)
yield tuple(pool[i] for i in indices)
while True:
    for i in reversed(range(r)):
        if indices[i] != i + n - r:
            break
    else:
        return
    indices[i] += 1
    for j in range(i+1, r):
        indices[j] = indices[j-1] + 1
    yield tuple(pool[i] for i in indices)

if __name__ == '__main__':
    teamList = list(combinations(range(36), 10))

After that, Python uses 2+ GB of my RAM but never seems to finish the computations.

rk919
  • 307
  • 2
  • 6
  • 13
  • 2
    Show us your current itertools.combinations implementation. I believe we will all agree itertools is the best approach, but if you're concerned about optimization, we can help there only when we see your implementation. – hexparrot Mar 26 '12 at 15:27
  • I think you need to get a function which maps ints in range(254...) to combinations (without generating all previous combinations) and then use a random choice of integers in that range to select the combinations. Some of the other "obvious" solutions create all or most of the combos in the background -- which isn't feasible. – Aaron Watters Mar 26 '12 at 15:35
  • Must all the combinations have equal probability? – John La Rooy Mar 26 '12 at 18:15
  • 1
    This is a variation on what is fast becoming an FAQ here on SO. For example, refer to http://stackoverflow.com/questions/1776442/nth-combination. – High Performance Mark Mar 27 '12 at 08:27

6 Answers6

2

Am I under-thinking this?

from random import sample

dataset = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
for i in xrange(1200):
  print sample(dataset,10)
MattH
  • 37,273
  • 11
  • 82
  • 84
  • You have to beware of duplicate samples, but if you keep a list (or, maybe better, a set of tuples) you can solve that easily enough. – DSM Mar 26 '12 at 15:28
  • Wasn't sure whether the requirements precluded duplicates. If each of the 1200 is supposed to be random, than filtering previously seen means subsequent results aren't random. – MattH Mar 26 '12 at 15:30
  • I got the sense it did and you got the sense it didn't, so it must be vague. :^) But I disagree that removing dups would mean subsequent results aren't random: when I draw Scrabble letters from a bag, I'd still call the second draw random, even if I knew that it couldn't be an X. – DSM Mar 26 '12 at 15:34
  • @DSM: Guess we have a different sense about random too! :) By your reckoning counting cards at blackjack doesn't increase a would-be cheater's chances of winning. Perhaps we could be both right if I made the slight amendment of "subsequent results aren't *as* random". – MattH Mar 26 '12 at 15:44
  • 1
    @MattH They're just as random. After `sample` picks `A` from `dataset`, it can't pick `A` again, does that make the rest of the selections less random? The population changes, not the randomness. – agf Mar 26 '12 at 15:46
  • @agf: Every time you take a random sample and remove it from the population for subsequent samples you have reduced the population and reduced entropy. If there are 254,186,856 possible cominations and you've taken and excluded 254,186,855 samples, do you think the final sample is random? – MattH Mar 26 '12 at 15:54
  • @MattH That's not the correct (technical) way to use the word "random". Yes, it's a random selection from a population of one. The conditional probability of that sample being picked last, given that it wasn't picked any of the other times, is 100%. But the chance of it being picked last is the same as the chance of any other being picked last, and the same as the chance of any being picked at any selected point. You're conflating conditional probability with randomness. – agf Mar 26 '12 at 15:58
1

You cannot use random.sample directly on combinations iterator, but you can use it to create random indices:

indices = random.sample(xrange(number_of_combinations), 1200)
comb = itertools.combinations(range(36), 10)

prev = 0
for n in sorted(indices):
    print next(itertools.islice(comb, n-prev, None))
    prev = n

random.sample will select each index only once, so you don't have to worry about duplicates. Also it doesn't need to generate 254,186,856 indexes to select 1200 of them.

If you have SciPy, you can easily calculate number of combinations using scipy.misc.comb, which is fast and efficient way to calculate it:

number_of_combinations = scipy.misc.comb(36, 10, exact=True)

Otherwise you can use this snippet:

def number_of_combinations(n, k):
    if k < 0 or k > n:
        return 0
    if k > n - k: # take advantage of symmetry
        k = n - k
    c = 1
    for i in range(k):
        c = c * (n - (k - (i+1)))
        c = c // (i+1)
    return c
vartec
  • 131,205
  • 36
  • 218
  • 244
0

Note that:

  • The list which we do not want to produce, if we did produce it, could be sorted
  • It's possible to produce the combination that would reside at a given index in that sorted list without actually producing the list (see iterCombination, below)
  • It's possible to sample a range without producing the entire range as a collection

As such, without producing any length 36C10 list, we can sample 1200 distinct indices from the 36C10 possible indices and then produce the combinations that would reside at those indices if we had produced and sorted the list of all combinations.

With the below code, this is a generator that will yield 1200 distinct samples:
combinations.iterRandomSampleOfCombinations(list(range(36)), 10, 1200)

Combination at an index in a lexicographically sorted list of all combinations of nCk
(From my answer to Nth combination):

def iterCombination(index, n, k):
    '''Yields the items of the single combination that would be at the provided
    (0-based) index in a lexicographically sorted list of combinations of choices
    of k items from n items [0,n), given the combinations were sorted in 
    descending order. Yields in descending order.
    '''
    nCk = 1
    for nMinusI, iPlus1 in zip(range(n, n - k, -1), range(1, k + 1)):
        nCk *= nMinusI
        nCk //= iPlus1
    curIndex = nCk
    for k in range(k, 0, -1):
        nCk *= k
        nCk //= n
        while curIndex - nCk > index:
            curIndex -= nCk
            nCk *= (n - k)
            nCk -= nCk % k
            n -= 1
            nCk //= n
        n -= 1
        yield n

A random sample of combinations without creating a list of combinations:

import random

def choose(n, k):
    '''Returns the number of ways to choose k items from n items'''
    reflect = n - k
    if k > reflect:
        if k > n:
            return 0
        k = reflect
    if k == 0:
        return 1
    for nMinusIPlus1, i in zip(range(n - 1, n - k, -1), range(2, k + 1)):
        n = n * nMinusIPlus1 // i
    return n

def iterRandomSampleOfCombinations(items, combinationSize, sampleSize):
    '''Yields a random sample of sampleSize combinations, each composed of
    combinationSize elements chosen from items.
    The sample is as per random.sample, thus any sub-slice will also be a valid
    random sample.
    Each combination will be a reverse ordered list of items - one could reverse
    them or shuffle them post yield if need be.
    '''
    n = len(items)
    if n < 1 or combinationSize < 1 or combinationSize > n:
        return
    nCk = choose(n, combinationSize)
    if sampleSize > nCk:
        return
    for sample in random.sample(range(nCk), sampleSize):
        yield [items[i] for i in iterCombination(sample, n, combinationSize)]

Example, a sample of 29 combinations of length 10 chosen from 36 items [A-Z] + [a-j]:

>>> items = [chr(i) for i in range(65, 91)] + [chr(i) for i in range(97, 107)]
>>> len(items)
36
>>> for combination in combinations.iterRandomSampleOfCombinations(items, 10, 29):
...     sampledCombination
...
['i', 'e', 'b', 'Z', 'U', 'Q', 'N', 'M', 'H', 'A']
['j', 'i', 'h', 'g', 'f', 'Z', 'P', 'I', 'G', 'E']
['e', 'a', 'Z', 'U', 'Q', 'L', 'G', 'F', 'C', 'B']
['i', 'h', 'f', 'Y', 'X', 'W', 'V', 'P', 'I', 'H']
['g', 'Y', 'V', 'S', 'R', 'N', 'M', 'L', 'K', 'I']
['j', 'i', 'f', 'e', 'd', 'b', 'Z', 'X', 'W', 'L']
['g', 'f', 'e', 'Z', 'T', 'S', 'P', 'L', 'J', 'E']
['d', 'c', 'Z', 'X', 'V', 'U', 'S', 'I', 'H', 'C']
['f', 'e', 'Y', 'U', 'I', 'H', 'D', 'C', 'B', 'A']
['j', 'd', 'b', 'W', 'Q', 'P', 'N', 'M', 'F', 'B']
['j', 'a', 'V', 'S', 'P', 'N', 'L', 'J', 'H', 'G']
['g', 'f', 'e', 'a', 'W', 'V', 'O', 'N', 'J', 'D']
['a', 'Z', 'Y', 'W', 'Q', 'O', 'N', 'F', 'B', 'A']
['i', 'g', 'a', 'X', 'V', 'S', 'Q', 'P', 'H', 'D']
['c', 'b', 'a', 'T', 'P', 'O', 'M', 'I', 'D', 'B']
['i', 'f', 'b', 'Y', 'X', 'W', 'V', 'U', 'M', 'A']
['j', 'd', 'U', 'T', 'S', 'K', 'G', 'F', 'C', 'B']
['c', 'Z', 'X', 'U', 'T', 'S', 'O', 'M', 'F', 'D']
['g', 'f', 'X', 'S', 'P', 'M', 'F', 'D', 'C', 'B']
['f', 'Y', 'W', 'T', 'P', 'M', 'J', 'H', 'D', 'C']
['h', 'b', 'Y', 'X', 'W', 'Q', 'K', 'F', 'C', 'B']
['j', 'g', 'Z', 'Y', 'T', 'O', 'L', 'G', 'E', 'D']
['h', 'Z', 'Y', 'S', 'R', 'Q', 'H', 'G', 'F', 'E']
['i', 'c', 'X', 'V', 'R', 'P', 'N', 'L', 'J', 'E']
['f', 'b', 'Z', 'Y', 'W', 'V', 'Q', 'N', 'G', 'D']
['f', 'd', 'c', 'b', 'V', 'T', 'S', 'R', 'Q', 'B']
['i', 'd', 'W', 'U', 'S', 'O', 'N', 'M', 'K', 'G']
['g', 'f', 'a', 'W', 'V', 'T', 'S', 'R', 'H', 'B']
['g', 'f', 'a', 'W', 'T', 'S', 'O', 'L', 'K', 'G']

Note: The combinations themselves are (reverse) ordered, but the sample is random (as is any slice thereof since random.sample was employed on a range object), if a random order is also required simply perform random.shuffle(combination).

It's pretty fast too (although maybe a different ordering of combinations could be faster?):

>>> samples = 1000
>>> sampleSize = 1200
>>> combinationSize = 10
>>> len(items)
36
>>> while 1:
...     start = time.clock()
...     for i in range(samples):
...             for combination in iterRandomSampleOfCombinations(items, combinationSize, sampleSize):
...                     pass
...     end = time.clock()
...     print("{0} seconds per length {1} sample of {2}C{3}".format((end - start)/samples, sampleSize, len(items), combinationSize))
...     break
...
0.03162827446371375 seconds per length 1200 sample of 36C10
Jonathan Allan
  • 427
  • 6
  • 10
0

Try the following implementation.

>>> def nCr(data,r,size):
    result=set()
    while len(result) < size:
        result.add(''.join(random.sample(data,r)))
    return list(result)

In order to generate 1200 unique samples of 36Cr for a given data of length of 36 you may do something similar

>>> data = string.ascii_letters[:36]
>>> print nCr(data,10,1200)
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • 2
    I think you need to sort the items in the set (or use frozensets), otherwise you can get duplicates that differ in ordering – John La Rooy Mar 26 '12 at 18:26
  • As @JohnLaRooy points out the current implementation can contain duplicates; to see this try running: sorted(sorted(item) for item in nCr(data[:6], 2, 15)) once or twice and you will probably see duplicates such as ['A', 'B'] when the result should be one of each possible combination (15 = 6-choose-2). If you resolve this (e.g. add a sorted tuple rather than joined string) as suggested then the function will get slow for large data and r approaching len(data) as it will keep randomly selecting previously chosen combinations. – Jonathan Allan Apr 04 '16 at 19:51
  • ...it also will quite happily produce results for size > len(data)-choose-r, so if you run nCr(data[:3], 2, 6) you will certainly see duplicates - you will see all six possible ordered combinations, while there are only three combinations; change 6 to 7 and it will spin trying to find a seventh distinct item that it will never find. – Jonathan Allan Apr 04 '16 at 20:22
0

You can use the technique where you iterate through the entire sequence, but slightly increase the probability that an element will be picked for each step. This will yield an unbiased random sample, with 1200 (depending on what probability you pick) elements.

It will force you to generate more or less the entire sequence, but you will not have to put the elements into memory. See for example: http://www.javamex.com/tutorials/random_numbers/random_sample.shtml

Gurgeh
  • 2,130
  • 15
  • 28
0

I think this will give you the answer you are looking for:

I am iterating over an generator that contains all possible combinations, and picking out n number of them at random.

from itertools import combinations
import random as rn
from math import factorial
import time

def choose_combos(n,r,n_chosen):

    total_combs = factorial(n)/(factorial(n-r)*factorial(r))

    combos = combinations(range(n),r)

    chosen_indexes = rn.sample(xrange(total_combs),n_chosen)

    random_combos = []

    for i in xrange(total_combs):
        ele = combos.next()
        if i in chosen_indexes:
            random_combos.append(ele)

    return random_combos

start_time = time.time()

print choose_combos(36,10,5) #you would have to use 1200 instead of 5

print 'time taken', time.time() - start_time

It will take some time, but it is not insane:

[(0, 6, 10, 13, 22, 27, 28, 29, 30, 35), (3, 4, 11, 12, 13, 16, 19, 26, 31, 33), (3, 7, 8, 9, 11, 19, 20, 23, 28, 30), (3, 10, 15, 19, 23, 24, 29, 30, 33, 35), (7, 14, 16, 20, 22, 25, 29, 30, 33, 35)]

time taken 111.286000013
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • Yes, that works just fine except for the fact that that still takes too long to compute. I need it to compute this within an hour, tops. – rk919 Mar 27 '12 at 03:23
  • From the post it looks like it takes under 2 minutes. Computers are fast ;) – Gurgeh Mar 27 '12 at 08:17