Split python tuple in subtuples with capacity limit in functional programming style

Question

I have some tuple in python. And capacity limit, for example, is 5. I want to split tuple in subtuples limited by sum of them elements:

For example:

input: (3, 1, 4, 2, 2, 1, 1, 2) and capacity = 5
output: (3, 1) (4) (2, 2, 1) (1, 2) #each subtuple is less than 5, order safe.

I am looking for a nice expressive solution of this task preferable in functional style of programming (using itertools.dropwhile for example or something like that)

I found a way but it is ugly. – Jean-François Fabre Oct 31 '16 at 06:52 — Jean-François Fabre, Oct 31 '16 at 06:52

score 14 · Accepted Answer · answered Oct 30 '16 at 17:56

You could encapsulate non-functional part and call it from functional code:

from itertools import groupby

class GroupBySum:
    def __init__(self, maxsum):
        self.maxsum = maxsum
        self.index = 0
        self.sum = 0

    def __call__(self, value):
        self.sum += value
        if self.sum > self.maxsum:
            self.index += 1
            self.sum = value
        return self.index

# Example:

for _, l in groupby((3, 1, 4, 2, 2, 1, 1, 2), GroupBySum(5)):
    print(list(l))

This one, although so "impure" it makes me shiver, is really the most readable solution here, IMO ;) — phipsgabler, Oct 30 '16 at 21:53

phipsgabler · Answer 2 · 2016-10-30T21:49:05.790

I couldn't help it but write something close to what I'd do in Haskell (while still somewhat pythonic, I think):

def take_summed(xs, cap):
    if len(xs) <= 1:
        return xs, ()
    else:
        x, *rest = xs

        if x > cap:
            return (), xs
        else:
            init, tail = take_summed(rest, cap - x)
            return (x,) + tuple(init), tail

def split(xs, cap=5):
    if len(xs) <= 1:
        yield xs
    else:
        chunk, rest = take_summed(xs, cap)
        yield chunk

        if rest != ():
            yield from split(rest, cap)

Never hesitate to split functions into subproblems. Result:

In [45]: list(split((3, 1, 4, 2, 2, 1, 1, 2), 5))
Out[45]: [(3, 1), (4,), (2, 2, 1), (1, 2)]

The problem with making this shorter is not that it's not doable without side effects, but that you have to carry around additional accumulated state, so even when using reduce you would need to invent something really complex, to pass around the sum between applications.

This is nice and straightforward, readable and 100% functional, I am surprised not to see it higher up. — asachet, Oct 31 '16 at 09:07

score 4 · Answer 3 · answered Oct 30 '16 at 19:18

Here's a slightly different approach from that of @Jean, that slices the input tuple instead of building smaller lists with appends, and offers a little performance boost:

def group_by_capacity(tup, capacity=5):
    t = iter(tup)
    curr, s =  0, next(t)

    for i, v in enumerate(t, 1):
        if s + v  > capacity:
            yield tup[curr:i]
            curr  = i
            s = v
        else:
            s += v
    yield tup[curr:]

>>> list(group_by_capacity((3, 1, 4, 2, 2, 1, 1, 2)))
[(3, 1), (4,), (2, 2, 1), (1, 2)]

Some timing:

In [35]: from random import randrange

In [36]: start = tuple((randrange(1,5) for _ in range(100000)))

In [37]: %%timeit
   ....: list(group_by_capacity(start))
   ....:
10 loops, best of 3: 47.4 ms per loop

In [38]: %%timeit
   ....: list(generate_tuple(start))
   ....:
10 loops, best of 3: 61.1 ms per loop

I'm feeling that everyone agrees that a full-functionnal programming one-liner would be really tough! — Jean-François Fabre, Oct 30 '16 at 19:52
check my last edit, a functional approach. Horrible, but works. — Jean-François Fabre, Oct 30 '16 at 20:49

score 4 · Answer 4 · answered Oct 31 '16 at 02:22

I'm a little surprised no one has used itertools.accumulate with a key function yet. Anyway, my entry:

from itertools import groupby, accumulate

def sumgroup(seq, capacity):
    divided = accumulate(enumerate(seq),
                         lambda x,y: (x[0],x[1]+y[1])
                                     if x[1]+y[1] <= capacity else (x[0]+1,y[1]))
    seq_iter = iter(seq)
    grouped = groupby(divided, key=lambda x: x[0])
    return [[next(seq_iter) for _ in g] for _,g in grouped]

There are lots of variants, e.g. you could use zip(seq, divided) to avoid the seq_iter, etc., but this was the first way that came to mind. It gives me

In [105]: seq = [3, 1, 4, 2, 2, 1, 1, 2]

In [106]: sumgroup(seq, 5)
Out[106]: [[3, 1], [4], [2, 2, 1], [1, 2]]

and agrees with the GroupBySum result:

In [108]: all(sumgroup(p, 5) == [list(l) for _, l in groupby(p, GroupBySum(5))]
     ...:     for width in range(1,8) for p in product(range(1,6), repeat=width))
     ...:     
     ...: 
Out[108]: True

I think your code would be significantly more readable if you used named function and tuple unpacking instead of lambda — GingerPlusPlus, Nov 05 '16 at 11:32

Jean-François Fabre · Answer 5 · 2016-10-30T20:48:42.370

I was waiting for the first answer to provide a slightly functional approach:

start = (3, 1, 4, 2, 2, 1, 1, 2)

def generate_tuple(inp):
    current_sum = 0
    current_list = []
    for e in inp:
        if current_sum + e <= 5:
            current_list.append(e)
            current_sum += e
        else:
            if current_list:  # fixes "6" in first position empty tuple bug
                yield tuple(current_list)
            current_list = [e]
            current_sum = e
    yield tuple(current_list)

print([i for i in generate_tuple(start)])

result:

[(3, 1), (4,), (2, 2, 1), (1, 2)]

EDIT: I found a full functional approach using a memory effect otherwise it is not doable. It's ugly and it hurts me only when I think how I'll explain it clearly. I have spiked the input data set a little or it would have been too easy

start = (6, 7, 3, 1, 4, 2, 2, 1, 1, 2, 3, 1 ,3, 1, 1)

now the code. 3 lines, get some aspirin, you'll need it as much as I did:

mem=[0,0]
start = start + (5,)
print([start[mem[-2]:n] for i in range(0,len(start)) for n in range(i+1,len(start)) if ((n==i+1 and start[i]>=5) or (sum(start[mem[-1]:n])<=5 and sum(start[mem[-1]:n+1])>5)) and not mem.append(n)])

I'll try to explain.

I use a memory effect because it's not possible without it. Stored in mem and set to 0,0 at start
since the function disregards last item, I modify input data to add threshold value to previous values are not dropped
the only simple thing is the computation of 2 sums and detection of the index from which it exceeds the threshold. When this threshold is detected, both conditions are met and the third condition is activated: store index in mem. Since append returns None, the last condition is made to be always true
That ((n==i+1 and start[i]>=5) is to detect single values greater or equal to 5.
The rest is some fine-tuning until the output is the same as the procedural approach which now doesn't look so bad in comparison :)

If the first item in the tuple is greater than `5` this will yield an empty tuple as the first result. — Moses Koledoye, Oct 30 '16 at 19:47

score 1 · Answer 6 · answered Oct 30 '16 at 16:57

Not sure why you need them all in tuples, but if you don't you can just remove the tuple(...) casting:

def chunkit(tpl, capacity):
    ret = []
    cur = []
    for x in tpl:
        if sum(cur) + x > capacity:
            ret.append(tuple(cur))
            cur = [x]
        else:
            cur.append(x)
    if cur != []:
        ret.append(tuple(cur))

    return tuple(ret)

A few examples:

In [24]: chunkit((3, 1, 4, 2, 2, 1, 1), 5)
Out[24]: ((3, 1), (4,), (2, 2, 1), (1,))

In [25]: chunkit((3, 1, 4, 2, 2, 1, ), 5)
Out[25]: ((3, 1), (4,), (2, 2, 1))

In [26]: chunkit((3, 1, 4, 2, 2, 1, 5), 5)
Out[26]: ((3, 1), (4,), (2, 2, 1), (5,))

In [27]: chunkit((3, 1, 4, 2, 2, 1, 5, 6), 5)
Out[27]: ((3, 1), (4,), (2, 2, 1), (5,), (6,))

In [28]: chunkit((3, 1, 4, 2, 2, 1, 5, 6, 1, 6), 5)
Out[28]: ((3, 1), (4,), (2, 2, 1), (5,), (6,), (1,), (6,))

score 1 · Answer 7 · answered Oct 30 '16 at 18:06

Don't know if this counts as functional, but it's the closest I could think of:

def groupLimit(iterable, limit):
    i, cSum = 0, 0
    def pred(x):
        nonlocal i, cSum, limit
        i, cSum = (i + 1, x) if (x + cSum) > limit else (i, cSum + x)
        return i if x <= limit else -1
    return (tuple(g) for k, g in itertools.groupby(iterable, pred) if k != -1)

This will also sort out single values greater than the limit. If that's not intended the last two lines can be changed to:

        return i
    return (tuple(g) for k, g in itertools.groupby(iterable, pred))

example:

t = (3, 1, 6, 2, 2, 1, 1, 2)
a = groupLimit(t,5)
print(tuple(a))
# version 1 -> ((3, 1), (2, 2, 1), (1, 2))
# version 2 -> ((3, 1), (6,), (2, 2, 1), (1, 2))

score 1 · Answer 8 · edited May 23 '17 at 12:09

1

Lets define the powerset with itertools

from itertools import chain, combinations

def powerset(lst):
    for subset in chain.from_iterable(combinations(lst, r) for r in range(len(lst)+1)):
        yield subset

Then we can do it in a one-liner

[subset for subset in powerset(input) if sum(subset)<=capacity]

edited May 23 '17 at 12:09

Community

1
1

answered Oct 30 '16 at 18:14

Uri Goren

13,386
6
58
110

1

Good idea! very readable and expressive but unfortunately it does not solve a problem beacause it's breaks order (subtuple element's must be neighbors) and also has 2^N time complexity – Evg Oct 30 '16 at 20:30

score 1 · Answer 9 · answered Oct 31 '16 at 01:02

A more general solution:

def groupwhile(iterable,predicate,accumulator_function):
    continue_group = False
    iterator = iter(iterable)
    try:
        accumulated = next(iterator)
    except StopIteration:
        return
    current_group = [accumulated]
    for item in iterator:
        continue_group = predicate(accumulated,item)
        if continue_group:
            current_group.append(item)
            accumulated = accumulator_function(accumulated,item)
        else:
            yield current_group
            accumulated = item
            current_group = [item]

    yield current_group

#your case
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_sum,item: previous_sum + item <= 5,
    lambda previous_sum,item: previous_sum + item,
))) == [[3, 1], [4], [2, 2, 1], [1, 2]]

#equivalent to groupby with key not set
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item == item,
    lambda _,item: item,
))) == [[3], [1], [4], [2, 2], [1, 1], [2]]

#break on duplicates
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item != item,
    lambda _,item: item,
))) == [[3, 1, 4, 2], [2, 1], [1, 2]]

#start new group when the number is one
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda _,item: item != 1,
    lambda _1,_2: None,
))) == [[3], [1, 4, 2, 2], [1], [1, 2]]

Fi3 · Answer 10 · 2016-11-05T00:08:43.540

My solution, not very clean but it use just reduce:

# int, (int, int, ...) -> ((int, ...), ...)
def grupBySum(capacity, _tuple):

    def  _grupBySum(prev, number):
        counter = prev['counter']
        result = prev['result']
        counter = counter + (number,)
        if sum(counter) > capacity:
            result = result + (counter[:-1],)
            return {'counter': (number,), 'result': result}
        else:
            return {'counter': counter, 'result': result}

result = reduce(_grupBySum, _tuple, {'counter': (), 'result': ()}).values()
return result[1]  + (result[0],)

f = (3, 1, 4, 2, 2, 1, 1, 2)
h = grupBySum(5, f)
print(h) # -> ((3, 1), (4,), (2, 2, 1), (1, 2))

Split python tuple in subtuples with capacity limit in functional programming style

10 Answers10