Partitioning non-unique entities into unique sets with Python

Question

I have a list of items, e.g. [1,1,1,1,2,2], and I am trying to find all the unique groups where these items are bundled into tuples of length one or two. For example, for the group above, I would like to find the following 10 possible groupings:

[[(1,),(1,),(1,),(1,),(2,),(2,)],
 [(1,1),(1,),(1,),(2,),(2,)],
 [(1,2),(1,),(1,),(1,),(2,)],
 [(2,2),(1,),(1,),(1,),(1,)],
 [(1,1),(1,1),(2,),(2,)],
 [(1,1),(1,2),(1,),(2,)],
 [(1,2),(1,2),(1,),(1,)],
 [(2,2),(1,1),(1,),(1,)],
 [(1,1),(1,1),(2,2)],
 [(1,1),(1,2),(1,2)]]

I've been playing with itertools to but can only manage to use it to find unique possible tuples (e.g. set(list(itertools.combinations((1,1,1,1,2,2),2)))) and any searches I do pop up solutions where the size of each group is constant and/or duplication of elements is not considered (example1, example2).

Ultimately, I am looking for a solution that will work for cases that are all ones ([1,1,1,...,1]), all twos ([2,2,2,...,2]) or some intermediate combination that includes an arbitrary number of 1s and 2s.

This is not all of the orderings. Even if we restrict the numbers to be in their own slots, you have 6!/(2!*4!)=15 pairs...Perhaps I'm misunderstanding you, but this seems very unclear to me. — Steve P., Nov 16 '13 at 04:05
How many elements can the original list have? That's crucial, because the number of partitions [grows very quickly](http://oeis.org/A000110) as the list gets longer. If the list is always "pretty small", then we can look at ways that post-process a full set of partitions, to throw out duplicates. Otherwise you need a fancier custom generator that doesn't generate duplicates to begin with. — Tim Peters, Nov 16 '13 at 04:37

Tim Peters · Accepted Answer · 2013-11-16T05:56:13.777

As I noted in a comment, the largest length of the input list is crucial. Here's example code that solves the specific example you gave quickly enough, by post-processing the full set of partitions (to weed out duplicates, and to weed out partitions with pieces that are "too large"). But it would be horridly inefficient for "long" original lists:

def part(xs):  # generate all partitions of xs
    xs = tuple(xs)
    n = len(xs)
    def extend(i):
        if i == n:
            yield ()
            return
        this = xs[i]
        for o in extend(i+1):
            yield ((this,),) + o
            for j, p in enumerate(o):
                yield o[:j] + ((this,) + p,) + o[j+1:]
    for o in extend(0):
        yield o

def upart(xs):  # weed out dups, and partitions with a piece bigger than 2
    from collections import Counter
    seen = []
    for p in part(xs):
        if all(len(chunk) <= 2 for chunk in p):
            c = Counter(p)
            if c not in seen:
                seen.append(c)
                yield p

xs = [1,1,1,1,2,2]
for o in upart(xs):
    print o

That displays the 10 unique partitions you're looking for.

BTW, for xs = [1,1,1,1,1,1] it produces:

((1,), (1,), (1,), (1,), (1,), (1,))
((1, 1), (1,), (1,), (1,), (1,))
((1, 1), (1, 1), (1,), (1,))
((1, 1), (1, 1), (1, 1))

A Custom Generator

As also noted in the comment, if post-processing the results of general building blocks is too inefficient, you need to "roll your own" from scratch. Here's one way that's very space-efficient, building unique results by construction (rather than by post-processing). There's really no "general way" of doing this - it requires analyzing the specific problem at hand, and writing code to exploit any quirks you can find:

def custom_gen(xs):
    from collections import Counter
    assert all(1 <= i <= 2 for i in xs)
    # There are only 5 unique pieces that can be used:
    pieces = [(1,), (2,), (1, 1), (2, 2), (1, 2)]
    countpieces = {piece: Counter(piece) for piece in pieces}

    def extend(i, n1, n2, result):
        # try all ways of extending with pieces[i];
        # there are n1 1's and n2 2's remaining to be used
        assert n1 >= 0 and n2 >= 0
        if n1 == n2 == 0:
            yield result
            return
        if i == len(pieces):  # dead end
            return
        piece = pieces[i]
        c = countpieces[piece]
        p1 = c[1]
        p2 = c[2]
        # What's the most number of this piece we could
        # possibly take?
        assert p1 or p2
        if p1:
            if p2:
                most = min(n1 // p1, n2 // p2)
            else:
                most = n1 // p1
        else:
            most = n2 // p2
        for count in range(most + 1):
            for t in extend(i+1,
                            n1 - count * p1,
                            n2 - count * p2,
                            result + [piece] * count):
                yield t

    c = Counter(xs)
    for t in extend(0, c[1], c[2], []):
        yield t

Note that the recursion never goes more than 5 deep (no matter how long the input list), so I'd wager this is about the most efficient that can be done without deeper analysis of the mathematics of the problem.

Thanks! This defnitely does the trick. My problem does grow a little (max length is 12), and for my requirements the few seconds it takes to run is completely acceptable. — ramzeek, Nov 17 '13 at 02:41
`custom_gen()` should run in less than an eyeblink for max length 12. The original (post-processing) one has to dig through 4,213,597 partitions to find the relative handful that succeed. That's why it takes a few seconds ;-) — Tim Peters, Nov 17 '13 at 02:46

Partitioning non-unique entities into unique sets with Python

1 Answers1

A Custom Generator