3

I've tried searching for a solution to this, but my ignorance of precise terminology doesn't help, hopefully the title of the question and the code below is enough explanation.

Here's my working so far:

C = [1,1,1,1,2,3,1,1,1,2,1]
sub_C = []
chunked_C = []
counter = 0
for i in C:
    counter += i
    if counter <= 3:
        sub_C.append(i)
    else:
        chunked_C.append(list(sub_C))
        del sub_C[:]
        counter = i
        sub_C.append(i)
print chunked_C

I want chunked_C to produce: [[1,1,1],[1,2],[3],[1,1,1],[2,1]]

Not sure where I'm going wrong, perhaps someone can help.

Edit: I've corrected the typos.

Also:

A slight revision in that I would need the incomplete tail of the list to be chunked too i.e. where the value is less than 3 but I run out of numbers.

e.g:

C = [1,1,1,1,2,3,1,1,1,2,1,1]
so chunked_C = [[1,1,1],[1,2],[3],[1,1,1],[2,1],[1]]

Hope that makes sense.

A further revision:

if C = [1,1,1,1,1,2,3,1,1,1,2,1]

chunked_C would equal [[1,1,1],[1,1],[2],[3],[1,1,1],[2,1]]

So I guess the logic needs to be revised further.

Johntyb
  • 105
  • 4

7 Answers7

6

Edit: Firstly, a correction as Ashwini Points out in the comments, we need to be sure we release the last chunk, even if it doesn't hit the target.

That said, there is a better solution to this problem using itertools.groupby():

import itertools

c = [1,1,1,1,2,3,1,1,1,2,1]

class group_by_sum:
    def __init__(self, target):
        self.target = 3
        self.current = False
        self.sum = 0

    def __call__(self, item):
        self.sum += item
        if self.sum > self.target:
            self.sum = item
            self.current = not self.current
        return self.current

    def group(self, iterable):
        return [tuple(items) for _, items in itertools.groupby(iterable, self)]

>>> group_by_sum(3).group(c)

[(1, 1, 1), (1, 2), (3,), (1, 1, 1), (2, 1)]

Obviously, the convenience method at the end isn't necessarily important, but it makes it simpler to use.


Old Answer:

This can be done nicely with a generator:

def chunk_to_sum(iterable, target):
    chunk_sum = 0
    chunk = []
    for item in iterable:
        chunk_sum += item
        if chunk_sum > target:
            yield chunk
            chunk = [item]
            chunk_sum = item
        else:
            chunk.append(item)
    if chunk: yield chunk

>>> list(chunk_to_sum([1, 1, 1, 1, 2, 3, 1, 1, 1, 2, 1], 3))
[[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]
Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
  • The OP might find [this very excellent answer](http://stackoverflow.com/a/231855/2359271) to be helpful if they are unfamiliar with the generator logic in yours. – Air Aug 30 '13 at 16:09
  • You need a `if chunk: yield chunk` in the end in case some items are left in `chunk` – Ashwini Chaudhary Aug 30 '13 at 18:31
  • The first solution with the revision provided by @AshwiniChaudhary worked perfectly. Thanks Lattyware much appreciated. – Johntyb Aug 31 '13 at 08:44
1

I think the line "if counter < 3:" is backwards logic from what you want it to be. Here's a corrected version I wrote up:

def chunk(to_chunk, n):
    """ 
    >>> chunk([1,1,1,1,2,3,1,1,1,2,1], 3) 
    [[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]
    """

    result, accum, total = [], [], 0

    for i in to_chunk:
        accum.append(i)

        if total + i >= n:
            result.append(accum)
            accum, total = [], 0
        else:
            total += i

    return result
1

Yeah, I think you want to use a generator. This is just a slight clean up of Lattyware's version. (go vote his up)

def chunk_to_sum(items, target_value):
    chunk = []
    for item in items:
        chunk.append(item)
        if sum(chunk) >= target_value:
            yield chunk
            chunk = []


C = [1,1,1,1,2,3,1,1,1,2,1,1]
list(chunk_to_sum(C, 3))

Out[2]: [[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]

monkut
  • 42,176
  • 24
  • 124
  • 155
0

For unknown reasons, the program ignores the last integer in the list, so here's a fixed version, just added an integer to the list so it ignores it and cares for the rest correctly

C = [1,1,1,1,2,3,1,1,1,2,1,1]
sub_C = []
chunked_C = []
counter = 0
for i in C:
    counter += i
    if counter <= 3:
        sub_C.append(i)
    else:
        chunked_C.append(list(sub_C))
        del sub_C[:]
        counter = i
        sub_C.append(i)
print chunked_C

produces:

[[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]
K DawG
  • 13,287
  • 9
  • 35
  • 66
  • 1
    -1. You haven't fixed anything, you've just implemented a broken algorithm that requires the wrong input to create the right output. Step through the logic item by item - pencil and paper if necessary - if you want to understand exactly why the last item is being ignored. It has to do with the ordering of statements; what is the very last thing that happens in the `for` loop? What still needs to happen when the `for` loop ends? – Air Aug 30 '13 at 16:16
0

You can use an iterator here, no need of an extra list here:

def solve(lis, chunk_size):
    it = iter(lis)
    first = next(it)
    chunked_C = [[first]]
    counter = first
    for i in it:
        if counter + i < chunk_size:
            chunked_C[-1].append(i)
            counter += i
        else:
            chunked_C.append([i])
            counter = i 
    return  chunked_C
print solve([1,1,1,1,2,3,1,1,1,2,1], 4)
print solve([1,1,1,1,1,2,3,1,1,1,2,1], 4)

output:

[[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]
[[1, 1, 1], [1, 1], [2], [3], [1, 1, 1], [2, 1]]

Using a generator function:

def solve(lis, chunk_size):
    chunk = []
    counter = 0
    for i in lis:
        if counter + i < chunk_size:
            chunk.append(i)
            counter += i
        else:
            yield chunk
            chunk = [i]
            counter = i
    if chunk: yield chunk

print list(solve([1,1,1,1,2,3,1,1,1,2,1], 4))
print list(solve([1,1,1,1,1,2,3,1,1,1,2,1], 4))

output:

[[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]
[[1, 1, 1], [1, 1], [2], [3], [1, 1, 1], [2, 1]]
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
0

Try this:

def get_subsection_from_index(l, index):
    subsection = []
    while sum(subsection) < 3:
        if index < len(l):
            subsection.append(l[index])
        else:
            break
        index += 1
    return subsection, index

i = 0
chunked_C = []
while i < len(C):
    subsection, i = get_subsection_from_index(C, i)
    chunked_C.append(subsection)
Germano
  • 2,452
  • 18
  • 25
0

How about two nested while loops:

C = [1,1,1,1,2,3,1,1,1,2,1]
out=[]
while C:
    temp=[C.pop(0)]
    while C and sum(temp) <3:
        temp.append(C.pop(0))
    out.append(temp)    

print out    
# [[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]

If performance is important, use a deque instead of a list:

from collections import deque

C = [1,1,1,1,2,3,1,1,1,2,1]
Cd=deque(C)
out=[]
while Cd:
    temp=[Cd.popleft()]
    while Cd and sum(temp) <3:
        temp.append(Cd.popleft())
    out.append(temp)    

Or, use the similar logic and convert to a generator:

def gen_chunks(it,n):
    try:
        while it:
            ck=[]
            while sum(ck)<n:
                ck.append(next(it))
            yield ck
    except StopIteration:
       pass 

print list(gen_chunks(iter(C),3))
# [[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]

If you want use the 'crumbs' that don't add up to n, that is easy to add:

def gen_chunks(it,n, return_crumbs=False):
    ck=[]
    try:
        while it:
            if sum(ck)>=n:
                yield ck
                ck=[]
            else:
                ck.append(next(it))
    except StopIteration:
        if return_crumbs and ck:
            yield ck

print list(gen_chunks(iter(C),3))
print list(gen_chunks(iter(C),3,True))
# [[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1]]        no crumbs
# [[1, 1, 1], [1, 2], [3], [1, 1, 1], [2, 1], [1]]   with crumbs
#                                             ^^^      crumb
dawg
  • 98,345
  • 23
  • 131
  • 206
  • How this is going to give better performance compared to simple iteration over list? – Ashwini Chaudhary Aug 30 '13 at 16:11
  • @AshwiniChaudhary: It **is** simple iteration over a list. Just like a `for` loop you only deal with each element once. And (for me) far simpler to read. – dawg Aug 30 '13 at 16:44