1

I can iterate over a list or string in fixed-size slices like this:

for n in range(0, len(somelongstring), 10):
    print(somelongstring[n:n+10])

But how do I iterate over 10-line slices from an open file, or over some other iterable, without reading the whole thing into a list? Every so often I need to do this, and there must be a straightforward formula using itertools, but there is nothing similar in the itertools documentation, and I can't google it or figure it out and I end up solving the problem some other way. What am I missing?

with open("filename.txt") as source:
    for tenlinegroup in ten_at_a_time_magic(source, 10):
         print(...)
alexis
  • 48,685
  • 16
  • 101
  • 161
  • I don't believe there is a straightforward way, as there is no "generic" way to combine the 10 items back to 1 that can be yielded. I guess in your example you expect them to be combined by "\n" or put in a list or something else. – treuss Oct 22 '22 at 19:37
  • List, tuple or sub-iterable, yes. Not combined into a string (even if the elements are strings), that would be up to the consuming code. – alexis Oct 22 '22 at 20:01

5 Answers5

1

I finally remembered that the term for this is "chunking", and then I was able to track it down, in the itertools recipes no less. Boiled down, it's this head-spinning little trick with zip (actually zip_longest, but who's counting):

def chunk(source, n):
    return zip_longest(*([iter(source)] * n))

First you take n references to the same iterator and bundle them into a list; then you unpack this list (with *) as arguments to zip_longest. When it's used, each tuple returned by zip_longest is filled from n successive calls to the iterator.

>>> for row in chunk(range(10), 3):
...     print(row)

(0, 1, 2)
(3, 4, 5)
(6, 7, 8)
(9, None, None)

See the itertools recipes (look for grouper) and this SO answer for variations on the corner cases.

alexis
  • 48,685
  • 16
  • 101
  • 161
0

My suggestions is that, if you need such a function regularly, why not write it yourself. Here is an example how I would code it:

def slice_into_lists(iterable, groupsize):
    rv = []
    count = 0
    for i in iterable:
        rv.append(i)
        count += 1
        if count % groupsize == 0:
            yield rv
            rv = []
    if rv:
        yield rv

Of course instead of returning a list, you could also return something different.

treuss
  • 1,913
  • 1
  • 15
0

The problem got me interested, so I worked on a solution that would not copy anything but only use the original iterator. Posting it here as it relates directly to the question.

class ichunk:
    ''' An iterable wrapper that raises StopIteration every chunksize+1'th call to __next__ '''

    def __init__(self, iterable, chunksize, fill_last_chunk, fill_value):
        self.it = iter(iterable)
        self.stopinterval = chunksize+1
        self.counter = 0
        self.has_ended = False
        self.fill_last = fill_last_chunk
        self.fill_value = fill_value

    def __iter__(self):
        return self

    def __next__(self):
        if self.counter > 0 and self.counter % self.stopinterval == 0:
            self.counter += 1
            raise StopIteration
        try:
            self.counter += 1
            nexti = next(self.it)
            return nexti
        except StopIteration as e:
            self.has_ended = True
            if (not self.fill_last) or (self.counter % self.stopinterval == 0):
                raise e
            else:
                return self.fill_value


def ichunker(iterable, chunksize, *, fill_last_chunk=False, fill_value=None):
    c = ichunk(iterable, chunksize, fill_last_chunk, fill_value)
    while not c.has_ended:
        yield c

So rather then using the returned value from ichunker, you iterate over it again like:

    with open("filename") as fp:
        for chunk in ichunker(fp, 10):
            print("------ start of Chunk -------")
            for line in chunk:
                print("--> "+line.strip())
            print("------ End of Chunk -------")
        print("End of iteration")
treuss
  • 1,913
  • 1
  • 15
0

Welcome to more_itertools. Straight from the documentation:

>>> from more_itertools import ichunk
>>> from itertools import count
>>> all_chunks = ichunked(count(), 4)
>>> c_1, c_2, c_3 = next(all_chunks), next(all_chunks), next(all_chunks)
>>> list(c_2)  # c_1's elements have been cached; c_3's haven't been
[4, 5, 6, 7]
>>> list(c_1)
[0, 1, 2, 3]
>>> list(c_3)
[8, 9, 10, 11]

You can choose between ichunk, which breaks an iterable into n sub-iterables, or chunk, which returns n sub-lists.

edd313
  • 1,109
  • 7
  • 20
-1

Use itertools.islice with enumerate like this

import itertools
some_long_string = "ajbsjabdabdkabda"
for n in itertools.islice(enumerate(some_long_string), 0, None ,10):
    print(some_long_string[n[0]:n[0]+10])
# output
ajbsjabdab
dkabda

If you are dealing with file then you can use chunk while file.read() like this

# for file 
with open("file1.txt", ) as f:
    for n in itertools.islice(enumerate(f.read(10000)), 0, 1000 ,5):
        print(n[1])
Deepak Tripathi
  • 3,175
  • 1
  • 8
  • 21
  • Thanks but the point is that you _cannot_ index into just any iterable. How do you apply a slice to the iterator returned by `open()`? (Anyway you don't need enumerate here, notice that you never use `n[1]`) – alexis Oct 22 '22 at 19:06
  • Enumerate is used for getting index , thatswhy i have used n[0] to get the index of start of string , n[1] will be value of string? – Deepak Tripathi Oct 22 '22 at 19:37
  • With open() you need to pass some iterator to enumerate you can pass the file pointer received with open() to enumerate ? – Deepak Tripathi Oct 22 '22 at 19:38
  • yes, the file pointer is an iterable, it returns one line at a time. – alexis Oct 22 '22 at 19:57