2

The problem to solve

Solve k-combination whose input may contain repeated items, and return only unique combinations.

e.g input is [0,1,2,2,4], k = 2, the result is:

{(0, 1), (2, 4), (1, 2), (0, 4), (1, 4), (0, 2), (2, 2)}


The solution I can think of

The only general solution I can think is to use a set or map to extract unique results.

e.g with python

from itertools import combinations
print (set(combinations([0, 1, 2, 2, 4], 2)))

Questions

  • Is there a general solution to do this, without using a set/map for filtering.

I can generate combinations of unique items, in order (refer: https://stackoverflow.com/a/76048486), but if there are repeated items, then there will be repeated combinations.

I've read this: https://math.stackexchange.com/a/554237 , seems it says there is no general solution to get all unique combinations, though there do have a formula to get count of unique combinations.

But, I'm not sure.


@Update - Summary & tips

  • According to the answers, there are ways both via iteration & recursion to do this.
  • When translating from python to go:
    • yield in python, need chan in go, or u need return all combinations (which can be huge for large input).
    • python's else for for and while is omitted when break.
    • list slicing in python will make a copy (of reference ?), while slicing a go slice will reuse the same underlying array, (so in go, if the slice being copied will be modified later, and u don't want that change in the copy, then u should copy/append it into a new slice).
Eric
  • 22,183
  • 20
  • 145
  • 196
  • Does this answer your question? [How to generate all distinct combinations (where input elements are repeated) in Python(using Itertools)?](https://stackoverflow.com/questions/55912262/how-to-generate-all-distinct-combinations-where-input-elements-are-repeated-in) – Joseph Wood Apr 19 '23 at 23:49
  • @JosephWood This is an `algorithm` question, and I don't see any fitting answer there. (I had btw made the same mistake, flagged as duplicate of [this](https://stackoverflow.com/q/70116916/12671057), and retracted when I realized this is not a `python` question...) – Kelly Bundy Apr 20 '23 at 00:00
  • @KellyBundy, btw didn't know about `more_itertools.distinct_combinations`. I'll have to check it out. – Joseph Wood Apr 20 '23 at 00:44
  • 1
    @JosephWood To me, `sympy.multiset_combinations` doesn't sound general. Can you use in in Java or Go? – Kelly Bundy Apr 20 '23 at 00:50
  • Using "set" simplifies the problem drastically, removes unnecessary complexity, so it is viable way. – MBo Apr 20 '23 at 03:23
  • @MBo When generate permutations, there is an algorithm to avoid duplicates without set, so I'm wondering is there a similar way for generating combination. Using set is simple, but may bring extra time & space in complexity. – Eric Apr 20 '23 at 07:42
  • @Eric Efforts for avoiding duplicates for large array might take much more extra time. Of course, it is needed to estimate balance. For example, we can sort items to provide easier duplicate usage prevention, but sorting itself is comparable with set construction. – MBo Apr 20 '23 at 07:57
  • @MBo In the case of permutation, there exists algorithm avoid duplication in nature, without extra time or space, it even keeps things in order. Check my answer with code: https://stackoverflow.com/a/75894614 , and the algorithm is explained here: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order Though I'm not sure whether same thing can be achieve for combination, and the question was asked trying to get an answer or tip. – Eric Apr 20 '23 at 08:00
  • @Eric Yes, I know about that algo for permutation. I wanted to say that it has sense because resulting permutation include all repeating items... but have just noticed that you want `(2, 2)` combination too - so set in not feasible here. Perhaps I did something like you need long ago, will try to look for. – MBo Apr 20 '23 at 08:17
  • [here](https://stackoverflow.com/a/68935843/844416) – MBo Apr 20 '23 at 08:24
  • @MBo Set works, I used `set()` in the question itself in the python example, it works well. And, I think https://stackoverflow.com/questions/75887690 this solution for permutation doesn't include duplicated permutation. – Eric Apr 20 '23 at 14:46
  • @Eric Aha, you are using set over resulting combinations - definitely overkill for large lists. – MBo Apr 20 '23 at 15:04
  • @MBo That's for example, we can put set inside function, and check in the loop while generating the return list. So that the set's size will be the same as the returned list. – Eric Apr 20 '23 at 15:06
  • @Eric Set storage, **generation all combinations** and checking them is not productive way. My link shows recursive method that doesn't require checking, it builds only needed combinations, as well as answers below. – MBo Apr 20 '23 at 15:11

3 Answers3

3

Here’s a non-recursive enumerator for combinations in lexicographic order, optimized for simplicity over speed. Knuth’s The Art of Computer Programming almost certainly has a speed-optimized version.

def combinations(a, k):
    a = sorted(a)
    while True:
        yield a[:k]
        for i in range(k - 1, -1, -1):
            # a[i + 1 :] is sorted. Sort a[i:].
            x = a[i]
            j = i + 1
            while j < len(a):
                if x < a[j]:
                    break
                a[j - 1] = a[j]
                j += 1
            a[j - 1] = x
            # a[j] is the next greatest element after x. If there are enough
            # elements after it, rotate them into position.
            if j + (k - i) <= len(a):
                rotate(a, i, j, j + (k - i))
                break
        else:
            break


def rotate(a, i, j, k):
    reverse(a, i, j)
    reverse(a, j, k)
    reverse(a, i, k)


def reverse(a, i, k):
    for j in range((k - i) // 2):
        a[i + j], a[(k - 1) - j] = a[(k - 1) - j], a[i + j]


print(*combinations([0, 1, 2, 2, 4], 2))
print(*combinations([0, 1, 2, 2, 4], 3))
print(*combinations([0, 0, 0, 1, 1, 1], 3))

Output:

[0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]
[0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]
[0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • I've translated your answer to go: https://go.dev/play/p/RyanyA38Vuc , as a generic version. Another go translation, that send combinations to channel: https://go.dev/play/p/P1_ik7w8j-Q , which is more like the yield feature in python. – Eric Apr 21 '23 at 09:29
1

Here is code that generates these one at a time in Python.

I'm using iterators. Translating away from that would be the hardest part of translating this to another language.

def unique_multisets (multiset, k):
    # Find the repetition of each item.
    count = {}
    for x in multiset:
        count[x] = 1 + count.get(x, 0)

    # Turn this into (value, count) pairs.  The reversed is
    # because we will reverse it later in generating the multisets.
    pairs = list(reversed(count.items()))

    # Calculate a running total.
    total = 0
    totals = []
    for p in pairs:
        total += p[1]
        totals.append(total)

    def recurse (partial, idx):
        if k == len(partial):
            yield partial
        elif idx < 0 or totals[idx] < k - len(partial):
            pass # Can't make combinations.
        else:
            (value, repeat) = pairs[idx]
            max_used = min(repeat, k - len(partial))
            for i in range(1 + max_used):
                partial.append(value)
            for i in range(1 + max_used):
                partial.pop()
                yield from recurse(partial, idx - 1)

    yield from recurse([], len(pairs)-1)

for x in unique_multisets([0, 1, 2, 2, 4], 2):
    print(x)
btilly
  • 43,296
  • 3
  • 59
  • 88
  • Looks correct, wondering is it possible to make it an `iteration`, (I'll try after fully understand it later). I head that every recursion have an iteration version, but not sure. – Eric Apr 20 '23 at 07:56
  • @Eric It goes the other way. Every iteration can be rewritten as recursion. But there are recursive functions that cannot be rewritten as iteration, the Ackermann function being the standard example. I can solve this one iteratively, but it will be a very different solution. – btilly Apr 20 '23 at 13:49
  • @btilly That doesn't sound right. Do you mean some restricted kind of "iteration"? – Kelly Bundy Apr 20 '23 at 14:12
  • @KellyBundy that exactly- you can compute any recursive function iteratively by emulating a call stack with segmenting the entry/exit points appropriately. Primitive recursive functions don't require this "hack"- you can determine the number of loops before entering the loop. Non-primitive recursive functions (like the ackermann function) require an explicit stack. – Dillon Davis Apr 20 '23 at 14:27
  • @KellyBundy I meant "bunch of `for` loops". With a call stack you can emulate anything - that's how we can run recursive functions in the first place. – btilly Apr 20 '23 at 14:52
  • Hmm, "bunch of for loops" is still unclear. But I doubt @Eric meant what you mean. – Kelly Bundy Apr 20 '23 at 15:00
  • According to https://stackoverflow.com/questions/931762 , multiple answers says every recursive function can be written in an iterative one. – Eric Apr 21 '23 at 09:48
1

Below is an algorithm I developed some time back for this task:

def multiset_combinations(v, m):
    
    v = sorted(v)
    f = [0]
    j = 0

    # Populating f, a list of expanded indices. E.g. For
    #
    #    v = [2, 2, 33, 33, 33, 40]
    #
    # f would be:
    #
    #    [0, 0, 1, 1, 1, 2]
    
    for i in range(1, len(v)):
        if v[i] != v[i - 1]:
            j += 1

        f.append(j)

    v = list(set(v))
    r = [0] * m
    n = len(v)
    
    z   = f[:m]
    idx = [0] * n
    p   = len(f) - m
    
    last = f[-m:]
    last[-1] += 1
    
    # Find the first occurence of each index. E.g. For the
    # same example above, idx would be:
    #
    #   [0, 2, 5]
    
    for i in range(n):
        idx[i] = f.index(i)
    
    # The main algorithm

    while True:
        while z[-1] < n:
            for j in range(m):
                r[j] = v[z[j]]

            z[-1] += 1
            yield r.copy()

        if z == last:
            break
        
        for i in range(m - 2, -1, -1):
            if z[i] != f[p + i]:
                z[i] += 1
                
                for j, k in zip(range(i + 1, m),
                                range(idx[z[i]] + 1, idx[z[i]] + m)):
                    z[j] = f[k]
                
                break

Calling it:

print(*multiset_combination([0, 1, 2, 2, 4], 2))
#> [0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]

print(*multiset_combination([0, 1, 2, 2, 4], 3))
#> [0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]

print(*multiset_combination([0, 0, 0, 1, 1, 1], 3))
#> [0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]

It is pretty efficient as well. It can generate 753564 in just under a second:

v1 = [1,
      2, 2,
      3, 3, 3,
      4, 4, 4, 4,
      5, 5, 5, 5, 5,
      6,
      7, 7,
      8, 8, 8,
      9, 9, 9, 9,
      10, 10, 10, 10, 10,
      11,
      12, 12,
      13, 13, 13,
      14, 14, 14, 14,
      15, 15, 15, 15, 15]
      
def get_time(v, m):
    t1 = time.time()
    list(multiset_combinations(v, m))
    t2 = time.time()
    return t2 - t1
    
get_time(v1, 10)
#> 0.8702290058135986
Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
  • 1
    Your python version finishes in `1.38 s`, the one from another answer https://stackoverflow.com/a/76064642 finishes in `9.93 s`, on my laptop, so indeed yours is faster. Though go is much faster than python seems. – Eric Apr 21 '23 at 13:33
  • 1
    I've translated your answer into `go`: https://go.dev/play/p/jdQvr6c5b0g , it finishes in `0.12 s` without print, while my translation for the other answer https://go.dev/play/p/RyanyA38Vuc takes `0.20 s` on my laptop. So, yours is still faster. BTW, the above link can't finish in https://go.dev/play probably due to it takes too much resource, need to run it locally. – Eric Apr 21 '23 at 18:06
  • 1
    I added another `go` translation which send combinations to go channel, which is more like `yield` in `python`: https://go.dev/play/p/3WivvzV2E0u , now it can be run online, it takes `0.15 s` on my laptop, and it's generic. – Eric Apr 21 '23 at 18:29