13

I have some tuple in python. And capacity limit, for example, is 5. I want to split tuple in subtuples limited by sum of them elements:

For example:

input: (3, 1, 4, 2, 2, 1, 1, 2) and capacity = 5
output: (3, 1) (4) (2, 2, 1) (1, 2) #each subtuple is less than 5, order safe.

I am looking for a nice expressive solution of this task preferable in functional style of programming (using itertools.dropwhile for example or something like that)

Netch
  • 4,171
  • 1
  • 19
  • 31
Evg
  • 2,978
  • 5
  • 43
  • 58

10 Answers10

14

You could encapsulate non-functional part and call it from functional code:

from itertools import groupby

class GroupBySum:
    def __init__(self, maxsum):
        self.maxsum = maxsum
        self.index = 0
        self.sum = 0

    def __call__(self, value):
        self.sum += value
        if self.sum > self.maxsum:
            self.index += 1
            self.sum = value
        return self.index

# Example:

for _, l in groupby((3, 1, 4, 2, 2, 1, 1, 2), GroupBySum(5)):
    print(list(l))
GingerPlusPlus
  • 5,336
  • 1
  • 29
  • 52
6

I couldn't help it but write something close to what I'd do in Haskell (while still somewhat pythonic, I think):

def take_summed(xs, cap):
    if len(xs) <= 1:
        return xs, ()
    else:
        x, *rest = xs

        if x > cap:
            return (), xs
        else:
            init, tail = take_summed(rest, cap - x)
            return (x,) + tuple(init), tail

def split(xs, cap=5):
    if len(xs) <= 1:
        yield xs
    else:
        chunk, rest = take_summed(xs, cap)
        yield chunk

        if rest != ():
            yield from split(rest, cap)

Never hesitate to split functions into subproblems. Result:

In [45]: list(split((3, 1, 4, 2, 2, 1, 1, 2), 5))
Out[45]: [(3, 1), (4,), (2, 2, 1), (1, 2)]

The problem with making this shorter is not that it's not doable without side effects, but that you have to carry around additional accumulated state, so even when using reduce you would need to invent something really complex, to pass around the sum between applications.

phipsgabler
  • 20,535
  • 4
  • 40
  • 60
  • This is nice and straightforward, readable and 100% functional, I am surprised not to see it higher up. – asachet Oct 31 '16 at 09:07
4

Here's a slightly different approach from that of @Jean, that slices the input tuple instead of building smaller lists with appends, and offers a little performance boost:

def group_by_capacity(tup, capacity=5):
    t = iter(tup)
    curr, s =  0, next(t)

    for i, v in enumerate(t, 1):
        if s + v  > capacity:
            yield tup[curr:i]
            curr  = i
            s = v
        else:
            s += v
    yield tup[curr:]

>>> list(group_by_capacity((3, 1, 4, 2, 2, 1, 1, 2)))
[(3, 1), (4,), (2, 2, 1), (1, 2)]

Some timing:

In [35]: from random import randrange

In [36]: start = tuple((randrange(1,5) for _ in range(100000)))

In [37]: %%timeit
   ....: list(group_by_capacity(start))
   ....:
10 loops, best of 3: 47.4 ms per loop

In [38]: %%timeit
   ....: list(generate_tuple(start))
   ....:
10 loops, best of 3: 61.1 ms per loop
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
4

I'm a little surprised no one has used itertools.accumulate with a key function yet. Anyway, my entry:

from itertools import groupby, accumulate

def sumgroup(seq, capacity):
    divided = accumulate(enumerate(seq),
                         lambda x,y: (x[0],x[1]+y[1])
                                     if x[1]+y[1] <= capacity else (x[0]+1,y[1]))
    seq_iter = iter(seq)
    grouped = groupby(divided, key=lambda x: x[0])
    return [[next(seq_iter) for _ in g] for _,g in grouped]

There are lots of variants, e.g. you could use zip(seq, divided) to avoid the seq_iter, etc., but this was the first way that came to mind. It gives me

In [105]: seq = [3, 1, 4, 2, 2, 1, 1, 2]

In [106]: sumgroup(seq, 5)
Out[106]: [[3, 1], [4], [2, 2, 1], [1, 2]]

and agrees with the GroupBySum result:

In [108]: all(sumgroup(p, 5) == [list(l) for _, l in groupby(p, GroupBySum(5))]
     ...:     for width in range(1,8) for p in product(range(1,6), repeat=width))
     ...:     
     ...: 
Out[108]: True
DSM
  • 342,061
  • 65
  • 592
  • 494
3

I was waiting for the first answer to provide a slightly functional approach:

start = (3, 1, 4, 2, 2, 1, 1, 2)

def generate_tuple(inp):
    current_sum = 0
    current_list = []
    for e in inp:
        if current_sum + e <= 5:
            current_list.append(e)
            current_sum += e
        else:
            if current_list:  # fixes "6" in first position empty tuple bug
                yield tuple(current_list)
            current_list = [e]
            current_sum = e
    yield tuple(current_list)

print([i for i in generate_tuple(start)])

result:

[(3, 1), (4,), (2, 2, 1), (1, 2)]

EDIT: I found a full functional approach using a memory effect otherwise it is not doable. It's ugly and it hurts me only when I think how I'll explain it clearly. I have spiked the input data set a little or it would have been too easy

start = (6, 7, 3, 1, 4, 2, 2, 1, 1, 2, 3, 1 ,3, 1, 1)

now the code. 3 lines, get some aspirin, you'll need it as much as I did:

mem=[0,0]
start = start + (5,)
print([start[mem[-2]:n] for i in range(0,len(start)) for n in range(i+1,len(start)) if ((n==i+1 and start[i]>=5) or (sum(start[mem[-1]:n])<=5 and sum(start[mem[-1]:n+1])>5)) and not mem.append(n)])

I'll try to explain.

  • I use a memory effect because it's not possible without it. Stored in mem and set to 0,0 at start
  • since the function disregards last item, I modify input data to add threshold value to previous values are not dropped
  • the only simple thing is the computation of 2 sums and detection of the index from which it exceeds the threshold. When this threshold is detected, both conditions are met and the third condition is activated: store index in mem. Since append returns None, the last condition is made to be always true
  • That ((n==i+1 and start[i]>=5) is to detect single values greater or equal to 5.
  • The rest is some fine-tuning until the output is the same as the procedural approach which now doesn't look so bad in comparison :)
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

Not sure why you need them all in tuples, but if you don't you can just remove the tuple(...) casting:

def chunkit(tpl, capacity):
    ret = []
    cur = []
    for x in tpl:
        if sum(cur) + x > capacity:
            ret.append(tuple(cur))
            cur = [x]
        else:
            cur.append(x)
    if cur != []:
        ret.append(tuple(cur))

    return tuple(ret)

A few examples:

In [24]: chunkit((3, 1, 4, 2, 2, 1, 1), 5)
Out[24]: ((3, 1), (4,), (2, 2, 1), (1,))

In [25]: chunkit((3, 1, 4, 2, 2, 1, ), 5)
Out[25]: ((3, 1), (4,), (2, 2, 1))

In [26]: chunkit((3, 1, 4, 2, 2, 1, 5), 5)
Out[26]: ((3, 1), (4,), (2, 2, 1), (5,))

In [27]: chunkit((3, 1, 4, 2, 2, 1, 5, 6), 5)
Out[27]: ((3, 1), (4,), (2, 2, 1), (5,), (6,))

In [28]: chunkit((3, 1, 4, 2, 2, 1, 5, 6, 1, 6), 5)
Out[28]: ((3, 1), (4,), (2, 2, 1), (5,), (6,), (1,), (6,))
Dekel
  • 60,707
  • 10
  • 101
  • 129
1

Don't know if this counts as functional, but it's the closest I could think of:

def groupLimit(iterable, limit):
    i, cSum = 0, 0
    def pred(x):
        nonlocal i, cSum, limit
        i, cSum = (i + 1, x) if (x + cSum) > limit else (i, cSum + x)
        return i if x <= limit else -1
    return (tuple(g) for k, g in itertools.groupby(iterable, pred) if k != -1)

This will also sort out single values greater than the limit. If that's not intended the last two lines can be changed to:

        return i
    return (tuple(g) for k, g in itertools.groupby(iterable, pred))

example:

t = (3, 1, 6, 2, 2, 1, 1, 2)
a = groupLimit(t,5)
print(tuple(a))
# version 1 -> ((3, 1), (2, 2, 1), (1, 2))
# version 2 -> ((3, 1), (6,), (2, 2, 1), (1, 2))
Velocirobtor
  • 188
  • 7
1

Lets define the powerset with itertools

from itertools import chain, combinations

def powerset(lst):
    for subset in chain.from_iterable(combinations(lst, r) for r in range(len(lst)+1)):
        yield subset

Then we can do it in a one-liner

[subset for subset in powerset(input) if sum(subset)<=capacity]
Community
  • 1
  • 1
Uri Goren
  • 13,386
  • 6
  • 58
  • 110
  • 1
    Good idea! very readable and expressive but unfortunately it does not solve a problem beacause it's breaks order (subtuple element's must be neighbors) and also has 2^N time complexity – Evg Oct 30 '16 at 20:30
1

A more general solution:

def groupwhile(iterable,predicate,accumulator_function):
    continue_group = False
    iterator = iter(iterable)
    try:
        accumulated = next(iterator)
    except StopIteration:
        return
    current_group = [accumulated]
    for item in iterator:
        continue_group = predicate(accumulated,item)
        if continue_group:
            current_group.append(item)
            accumulated = accumulator_function(accumulated,item)
        else:
            yield current_group
            accumulated = item
            current_group = [item]

    yield current_group

#your case
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_sum,item: previous_sum + item <= 5,
    lambda previous_sum,item: previous_sum + item,
))) == [[3, 1], [4], [2, 2, 1], [1, 2]]

#equivalent to groupby with key not set
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item == item,
    lambda _,item: item,
))) == [[3], [1], [4], [2, 2], [1, 1], [2]]

#break on duplicates
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item != item,
    lambda _,item: item,
))) == [[3, 1, 4, 2], [2, 1], [1, 2]]

#start new group when the number is one
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda _,item: item != 1,
    lambda _1,_2: None,
))) == [[3], [1, 4, 2, 2], [1], [1, 2]]
Siphor
  • 2,522
  • 2
  • 13
  • 10
1

My solution, not very clean but it use just reduce:

# int, (int, int, ...) -> ((int, ...), ...)
def grupBySum(capacity, _tuple):

    def  _grupBySum(prev, number):
        counter = prev['counter']
        result = prev['result']
        counter = counter + (number,)
        if sum(counter) > capacity:
            result = result + (counter[:-1],)
            return {'counter': (number,), 'result': result}
        else:
            return {'counter': counter, 'result': result}

result = reduce(_grupBySum, _tuple, {'counter': (), 'result': ()}).values()
return result[1]  + (result[0],)

f = (3, 1, 4, 2, 2, 1, 1, 2)
h = grupBySum(5, f)
print(h) # -> ((3, 1), (4,), (2, 2, 1), (1, 2))
Fi3
  • 429
  • 6
  • 17