2

I have a list of unique elements, let's say [1,2], and I want to split it onto k=2 sublists. Now I want to have all possible sublists:

[ [ [1,2],[] ], [ [1],[2] ], [ [2],[1] ], [ [],[1,2] ] ]

And I want to split onto 1<=k<=n sublists, so for k=1 it will be:

[ [1, 2] ]

How can I do that with Python 3?

UPDATE: my goal is to get all possible partitions of list of N unique numbers, where each partition will have k sublists. I would like to show better example than I have shown upper, I hope I will not miss something. So for list [1, 2, 3] and for k=2 I want to have next list:

[
[ [1,2,3], [] ],
[ [2,3], [1] ],
[ [1,3], [2] ],
[ [1,2], [3] ],
[ [1], [2,3] ],
[ [2], [1,3] ],
[ [3], [2,3] ],
[ [], [1,2,3] ]
] 

UPDATE 2: so far I have combined two suggestions and little modification got next code:

def sorted_k_partitions(seq, k):
    """Returns a list of all unique k-partitions of `seq`.

    Each partition is a list of parts, and each part is a tuple.

    The parts in each individual partition will be sorted in shortlex
    order (i.e., by length first, then lexicographically).

    The overall list of partitions will then be sorted by the length
    of their first part, the length of their second part, ...,
    the length of their last part, and then lexicographically.
    """
    n = len(seq)
    groups = []  # a list of lists, currently empty

    def generate_partitions(i):
        if i >= n:
            yield list(map(tuple, groups))
        else:
            if n - i > k - len(groups):
                for group in groups:
                    group.append(seq[i])
                    yield from generate_partitions(i + 1)
                    group.pop()

            if len(groups) < k:
                groups.append([seq[i]])
                yield from generate_partitions(i + 1)
                groups.pop()

    result = generate_partitions(0)

    # Sort the parts in each partition in shortlex order
    result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
    # Sort partitions by the length of each part, then lexicographically.
    result = sorted(result, key = lambda ps: (*map(len, ps), ps))

    return result

With this function I can do next:

import itertools as it
k=2
S = [1, 2, 3]
for i in (range(k)):
    for groups in sorted_k_partitions(S, k-i):
        for perm in it.permutations(groups+[tuple() for j in range(i)]):
            print(perm)

The output is:

((1,), (2, 3))
((2, 3), (1,))
((2,), (1, 3))
((1, 3), (2,))
((3,), (1, 2))
((1, 2), (3,))
((1, 2, 3), ())
((), (1, 2, 3))

I'm not sure yet, that this code gives me right solution, maybe there is other way?

user3057645
  • 321
  • 1
  • 2
  • 10

1 Answers1

2

Here is an alternative solution:

def partition_k(l, k):
    n = len(l)
    if k > n:
        raise ValueError("k = {0} should be no more than n = {1}".format(k, n))

    if n == 0:
        yield []
        return

    pos = [0] * n
    while True:
        # generate output for the value
        out = [[] for _ in range(k)]
        for i in range(n):
            out[pos[i]].append(l[i])
        yield out

        #increment the value
        pos[0] += 1
        for i in range(n):
            # should we carry from this digit to the next one?
            if pos[i] == k:
                # overflow of the whole value?
                if i == n - 1:
                    return
                pos[i] = 0
                pos[i + 1] += 1
            else:
                break

Let n be the length of the list and k is the number of partitions. The idea behind this code is that each line of the output can be represented as a number of n digits in base-k system. Each "digit" shows in which bucket goes the value at corresponding position. For example line

[[2,3], [1], [4]]

can be encoded as [1,0,0,2] which means

  • 1 goes to the bucket #1
  • 2 goes to the bucket #0
  • 3 goes to the bucket #0
  • 4 goes to the bucket #2

Obviously every such n-digits base-k number represents a valid partition of the list and each partition is represented by some number. So to generate all partitions we need just iterate through all such numbers and generate corresponding partitions. It is easier to do if you use a list of digits to represent a number (in the code this is pos).

SergGr
  • 23,570
  • 2
  • 30
  • 51
  • I like your idea, how do you think can your code be faster? I've just run yours and mine codes and for `k=3, S = range(14)` input I've gotten 44.2 and 9.3 sec respectively – user3057645 Dec 26 '18 at 23:15
  • @user3057645, what are the rangers of `n` and `k` for your real task? Is speed really an issue? One of the important advantages of `itertools` is that they are actually implemented in C so they are fast. On the other hand if you do any non-trivial processing of those lists, I think that processing will take more time than my code to generate them. And the total number of partitions grows really fast with `n` and `k` because as it is clear from my algorithm the total number is `k^n` (or in Python `k**n`) – SergGr Dec 26 '18 at 23:18
  • @user3057645, I also noticed that you code contains a bug: you generate some items several times. Particularly if `groups` contains several empty buckets (for `n`=`k`=`3` this is for example `[(0,1,2),(),()]`), you will generate non-unique permutations (so all 3 of `[(0,1,2),(),()]`, `[(),(0,1,2),()]` and `[(),(),(0,1,2)]` will be duplicated, for bigger `n` there will be more excessive items). This happens because empty lists are actually equal to each other and should not be swapped but your code is not aware of this fact. – SergGr Dec 26 '18 at 23:39
  • there are cases when S = range(400), k = 5, I need this list of possible partitions for finding that one which will give me better solution for bigger problem, namely the basic parallel-machine-scheduling :\ – user3057645 Dec 26 '18 at 23:42
  • yes, that how I needed it to be, I want to generate such 'non-unique' permutations – user3057645 Dec 26 '18 at 23:44
  • 1
    @user3057645, I think you don't get my bug report. I claim that for the case `S = range(3)`, `k = 3` you'll get 30 records instead of expected 27 because you'll have each of the records `[(0,1,2),(),()]`, `[(),(0,1,2),()]` and `[(),(),(0,1,2)]` **_twice_** in the output. – SergGr Dec 26 '18 at 23:47
  • As for you problem, I think your approach is impossible to implement in practice. The total number of such partitions for `n` = 400 and `k` = 5 is `5^400` which is an insanely huge number, much bigger than the number of particles in the Universe. If you can do that many computations on your hardware, you should be earning money by breaking banking cryptography instead. In practice it means that you'll have to use some heuristics instead of brute force. – SergGr Dec 26 '18 at 23:51
  • yeah, you are right, any Idea how to fix this duplication bug? And I know that it's insane to expect that I can calculate `5^400` in a reasonable time – user3057645 Dec 26 '18 at 23:59
  • @user3057645, the only way I can think of is to treat empty and non-empty buckets explicitly separately. This will make the code a lot more complicated but I don't see any simple solution. – SergGr Dec 27 '18 at 00:16
  • This is a good piece of code. In my case, order has no meaning, so I altered it to always return a `range(3)` in order of `0, 1, 2` when numbers are placed to all possible groups. – Jari Turkia Apr 12 '23 at 16:05