7

Given a combination of k of the first n natural numbers, for some reason I need to find the position of such combination among those returned by itertools.combination(range(1,n),k) (the reason is that this way I can use a list instead of a dict to access values associated to each combination, knowing the combination).

Since itertools yields its combinations in a regular pattern it is possible to do it (and I also found a neat algorithm), but I'm looking for an even faster/natural way which I might ignore.

By the way here is my solution:

def find_idx(comb,n):
    k=len(comb)
    idx=0
    last_c=0
    for c in comb:
        #idx+=sum(nck(n-2-x,k-1) for x in range(c-last_c-1)) # a little faster without nck caching
        idx+=nck(n-1,k)-nck(n-c+last_c,k) # more elegant (thanks to Ray), faster with nck caching
        n-=c-last_c
        k-=1
        last_c=c
    return idx

where nck returns the binomial coefficient of n,k.

For example:

comb=list(itertools.combinations(range(1,14),6))[654] #pick the 654th combination
find_idx(comb,14) # -> 654

And here is an equivalent but maybe less involved version (actually I derived the previous one from the following one). I considered the integers of the combination c as positions of 1s in a binary digit, I built a binary tree on parsing 0/1, and I found a regular pattern of index increments during parsing:

def find_idx(comb,n):
    k=len(comb)
    b=bin(sum(1<<(x-1) for x in comb))[2:]
    idx=0
    for s in b[::-1]:
        if s=='0':
            idx+=nck(n-2,k-1)
        else:
            k-=1
        n-=1
    return idx
mmj
  • 5,514
  • 2
  • 44
  • 51
  • 4
    +1 It might be helpful to take a look at the "source code" of `itertools.combinations`: http://docs.python.org/2/library/itertools.html#itertools.combinations – Joel Cornett Jan 22 '13 at 09:54
  • 2
    These look like iterative versions of the [unchoose](http://stackoverflow.com/a/3143594/487339) ranker I keep in the toolbox. `find_idx` seems faster than `unchoose`, which isn't surprising, as Python recursion tends to be slow. – DSM Jan 22 '13 at 10:06
  • 1
    Thinking out loud: would [combinatorial numbers](https://en.wikipedia.org/wiki/Combinatorial_number_system) help here? – Fred Foo Jan 22 '13 at 17:03
  • @Iarsmans Great link, thanks. Nevertheless I found a difference between lexicographic ordering explained in Wikipedia and the one used by itertools: the former assume decreasing ordering of chosen items, the latter increasing ordering. I'm trying to reduce my algorithm to something analogous (is not that far) to the elegant and fast one in Wikipedia, but I don't know whether it's feasible, due to such difference. – mmj Jan 22 '13 at 21:49

3 Answers3

2

Your solution seems quite fast. In find_idx, you have two for loop, the inner loop can be optimized using the formular:

C(n, k) + C(n-1, k) + ... + C(n-r, k) = C(n+1, k+1) - C(n-r, k+1)

so, you can replace sum(nck(n-2-x,k-1) for x in range(c-last_c-1)) with nck(n-1, k) - nck(n-c+last_c, k).

I don't know how you implement your nck(n, k) function, but it should be O(k) measured in time complexity. Here I provide my implementation:

from operator import mul
from functools import reduce # In python 3
def nck_safe(n, k):
    if k < 0 or n < k: return 0
    return reduce(mul, range(n, n-k, -1), 1) // reduce(mul, range(1, k+1), 1)

Finally, your solution become O(k^2) without recursion. It's quite fast since k wouldn't be too large.

Update

I've noticed that nck's parameters are (n, k). Both n and k won't be too large. We may speed up the program by caching.

def nck(n, k, _cache={}):
    if (n, k) in _cache: return _cache[n, k]
    ....
    # before returning the result
    _cache[n, k] = result
    return result

In python3 this can be done by using functools.lru_cache decorator:

@functools.lru_cache(maxsize=500)
def nck(n, k):
    ...
Ray
  • 1,647
  • 13
  • 16
  • Great "theoretical" improvement, I was looking exactly for such a formula. Unfortunately it seems not to bring much performance gain. I tried some tests and with great disappointment I found no speed advancement in comparison to the original version with `sum`. I guess that after all you need more cycles to calculate the 2 `nck` than to compute the several `nck` of the `sum` in the original version. By the way I updated the code to reflect your "theoretical" optimization. – mmj Feb 04 '13 at 11:42
  • Next optimization step should be to find a similar formula even for the main loop. – mmj Feb 04 '13 at 11:50
  • What a pity that the trick don't work well. I've introduced another way in my updated answer. The two method can be combined to use. – Ray Feb 05 '13 at 14:44
  • Caching surely is a possibility when the ranges of n and k are small (I actually used it in my real implementation), and in such case your improvement is useful cause it always needs just 2 lookups. – mmj Feb 06 '13 at 20:59
0

I dug up some old (although it's been converted to Python 3 syntax) code that includes the function combination_index which does what you request:

def fact(n, _f=[1, 1, 2, 6, 24, 120, 720]):
    """Return n!
    The “hidden” list _f acts as a cache"""
    try:
        return _f[n]
    except IndexError:
        while len(_f) <= n:
            _f.append(_f[-1] * len(_f))
        return _f[n]

def indexed_combination(n: int, k: int, index: int) -> tuple:
    """Select the 'index'th combination of k over n
    Result is a tuple (i | i∈{0…n-1}) of length k

    Note that if index ≥ binomial_coefficient(n,k)
    then the result is almost always invalid"""

    result= []
    for item, n in enumerate(range(n, -1, -1)):
        pivot= fact(n-1)//fact(k-1)//fact(n-k)
        if index < pivot:
            result.append(item)
            k-= 1
            if k <= 0: break
        else:
            index-= pivot
    return tuple(result)

def combination_index(combination: tuple, n: int) -> int:
    """Return the index of combination (length == k)

    The combination argument should be a sorted sequence (i | i∈{0…n-1})"""

    k= len(combination)
    index= 0
    item_in_check= 0
    n-= 1 # to simplify subsequent calculations
    for offset, item in enumerate(combination, 1):
        while item_in_check < item:
            index+= fact(n-item_in_check)//fact(k-offset)//fact(n+offset-item_in_check-k)
            item_in_check+= 1
        item_in_check+= 1
    return index

def test():
    for n in range(1, 11):
        for k in range(1, n+1):
            max_index= fact(n)//fact(k)//fact(n-k)
            for i in range(max_index):
                comb= indexed_combination(n, k, i)
                i2= combination_index(comb, n)
                if i2 != i:
                    raise RuntimeError("mismatching n:%d k:%d i:%d≠%d" % (n, k, i, i2))

indexed_combination does the inverse operation.

PS I remember that I sometime attempted removing all those fact calls (by substituting appropriate incremental multiplications and divisions) but the code became much more complicated and wasn't actually faster. A speedup was achievable if I substituted a pre-calculated list of factorials for the fact function, but again the speed difference was negligible for my use cases, so I kept this version.

tzot
  • 92,761
  • 29
  • 141
  • 204
  • You can just use stdlib [`math.factorial`](https://docs.python.org/3/library/math.html#math.factorial). And probably using [`math.comb`](https://docs.python.org/3/library/math.html#math.comb) then you won't even need the factorial function (it looks like using `fact` just as a means to compute nCr, correct me if I'm wrong). – wim Oct 08 '22 at 01:18
-1

Looks like you need to better specify your task or I am just getting it wrong. For me it seems that when you iterating through the itertools.combination you can save indexes you need to an appropriate data structure. If you need all of them then I would go with the dict (one dict for all your needs):

combinationToIdx = {}
for (idx, comb) in enumerate(itertools.combinations(range(1,14),6)):
    combinationToIdx[comb] = idx

def findIdx(comb):
    return combinationToIdx[comb]
varepsilon
  • 488
  • 5
  • 8
  • 1
    Using `dict` is more time and memory consuming than using `list`. Moreover, if you have (like I do) billions of combinations, to store all comb/index pairs you might need more memory and time than you have, whereas a function needs few bytes of memory and you spend time to get the index only when you need it. – mmj Jan 23 '13 at 09:38