44

I'm looking for a fast, clean, pythonic way to divide a list into exactly n nearly-equal partitions.

partition([1,2,3,4,5],5)->[[1],[2],[3],[4],[5]]
partition([1,2,3,4,5],2)->[[1,2],[3,4,5]] (or [[1,2,3],[4,5]])
partition([1,2,3,4,5],3)->[[1,2],[3,4],[5]] (there are other ways to slice this one too)

There are several answers in here Iteration over list slices that run very close to what I want, except they are focused on the size of the list, and I care about the number of the lists (some of them also pad with None). These are trivially converted, obviously, but I'm looking for a best practice.

Similarly, people have pointed out great solutions here How do you split a list into evenly sized chunks? for a very similar problem, but I'm more interested in the number of partitions than the specific size, as long as it's within 1. Again, this is trivially convertible, but I'm looking for a best practice.

martineau
  • 119,623
  • 25
  • 170
  • 301
Drew
  • 8,675
  • 6
  • 43
  • 41

5 Answers5

32

Just a different take, that only works if [[1,3,5],[2,4]] is an acceptable partition, in your example.

def partition ( lst, n ):
    return [ lst[i::n] for i in xrange(n) ]

This satisfies the example mentioned in @Daniel Stutzbach's example:

partition(range(105),10)
# [[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
# [1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101],
# [2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102],
# [3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103],
# [4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104],
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
# [6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
# [7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
# [8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
# [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]]
Manuel Zubieta
  • 582
  • 1
  • 6
  • 13
28

Here's a version that's similar to Daniel's: it divides as evenly as possible, but puts all the larger partitions at the start:

def partition(lst, n):
    q, r = divmod(len(lst), n)
    indices = [q*i + min(i, r) for i in xrange(n+1)]
    return [lst[indices[i]:indices[i+1]] for i in xrange(n)]

It also avoids the use of float arithmetic, since that always makes me uncomfortable. :)

Edit: an example, just to show the contrast with Daniel Stutzbach's solution

>>> print [len(x) for x in partition(range(105), 10)]
[11, 11, 11, 11, 11, 10, 10, 10, 10, 10]
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
23
def partition(lst, n):
    division = len(lst) / float(n)
    return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]

>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]

Python 3 version:

def partition(lst, n):
    division = len(lst) / n
    return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
João Silva
  • 89,303
  • 29
  • 152
  • 158
4

Below is one way.

def partition(lst, n):
    increment = len(lst) / float(n)
    last = 0
    i = 1
    results = []
    while last < len(lst):
        idx = int(round(increment * i))
        results.append(lst[last:idx])
        last = idx
        i += 1
    return results

If len(lst) cannot be evenly divided by n, this version will distribute the extra items at roughly equal intervals. For example:

>>> print [len(x) for x in partition(range(105), 10)]
[11, 10, 11, 10, 11, 10, 11, 10, 11, 10]

The code could be simpler if you don't mind all of the 11s being at the beginning or the end.

Daniel Stutzbach
  • 74,198
  • 17
  • 88
  • 77
  • 1
    The use of floating-point could result in machine-dependent results: when increment*i is (mathematically) exactly halfway between two integers, the rounding could go either way depending on what numerical errors have been introduced. How about using something like `idx = len(lst)*i//n` instead? Or perhaps `idx = (len(lst)*i + n//2)//n` to get results similar to the current code. – Mark Dickinson Apr 17 '10 at 21:40
  • FYI this is the well-known [Bresenham algorithm](http://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm). – smci Jul 03 '11 at 08:16
0

This answer provides a function split(list_, n, max_ratio), for people who want to split their list into n pieces with at most max_ratio ratio in piece length. It allows for more variation than the questioner's 'at most 1 difference in piece length'.

It works by sampling n piece lengths within the desired ratio range [1 , max_ratio), placing them after each other to form a 'broken stick' with the right distances between the 'break points' but the wrong total length. Scaling the broken stick to the desired length gives us the approximate positions of the break points we want. To get integer break points requires subsequent rounding.

Unfortunately, the roundings can conspire to make pieces just too short, and let you exceed the max_ratio. See the bottom of this answer for an example.

import random

def splitting_points(length, n, max_ratio):
    """n+1 slice points [0, ..., length] for n random-sized slices.

    max_ratio is the largest allowable ratio between the largest and the
    smallest part.
    """
    ratios = [random.uniform(1, max_ratio) for _ in range(n)]
    normalized_ratios = [r / sum(ratios) for r in ratios]
    cumulative_ratios = [
        sum(normalized_ratios[0:i])
        for i in range(n+1)
    ]
    scaled_distances = [
        int(round(r * length))
        for r in cumulative_ratios
    ]

    return scaled_distances


def split(list_, n, max_ratio):
    """Slice a list into n randomly-sized parts.

    max_ratio is the largest allowable ratio between the largest and the
    smallest part.
    """

    points = splitting_points(len(list_), n, ratio)

    return [
        list_[ points[i] : points[i+1] ]
        for i in range(n)
    ]

You can try it out like so:

for _ in range(10):
    parts = split('abcdefghijklmnopqrstuvwxyz', 4, 2)
    print([(len(part), part) for part in parts])

Example of a bad result:

parts = split('abcdefghijklmnopqrstuvwxyz', 10, 2)

# lengths range from 1 to 4, not 2 to 4
[(3, 'abc'),  (3, 'def'), (1, 'g'),
 (4, 'hijk'), (3, 'lmn'), (2, 'op'),
 (2, 'qr'),  (3, 'stu'),  (2, 'vw'),
 (3, 'xyz')]
Esteis
  • 4,669
  • 2
  • 29
  • 45