10

I'm looking for a way to "page through" a Python iterator. That is, I would like to wrap a given iterator iter and page_size with another iterator that would would return the items from iter as a series of "pages". Each page would itself be an iterator with up to page_size iterations.

I looked through itertools and the closest thing I saw is itertools.islice. In some ways, what I'd like is the opposite of itertools.chain -- instead of chaining a series of iterators together into one iterator, I'd like to break an iterator up into a series of smaller iterators. I was expecting to find a paging function in itertools but couldn't locate one.

I came up with the following pager class and demonstration.

class pager(object):
    """
    takes the iterable iter and page_size to create an iterator that "pages through" iter.  That is, pager returns a series of page iterators,
    each returning up to page_size items from iter.
    """
    def __init__(self,iter, page_size):
        self.iter = iter
        self.page_size = page_size
    def __iter__(self):
        return self
    def next(self):
        # if self.iter has not been exhausted, return the next slice
        # I'm using a technique from 
        # https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
        # to check for iterator completion by cloning self.iter into 3 copies:
        # 1) self.iter gets advanced to the next page
        # 2) peek is used to check on whether self.iter is done
        # 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
        self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
        try:
            next_v = next(peek)
        except StopIteration: # catch the exception and then raise it
            raise StopIteration
        else:
            # consume the page from the iterator so that the next page is up in the next iteration
            # is there a better way to do this?
            # 
            for i in itertools.islice(self.iter,self.page_size): pass
            return itertools.islice(iter_for_return,self.page_size)



iterator_size = 10
page_size = 3

my_pager = pager(xrange(iterator_size),page_size)

# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()

for page in my_pager:
    for i in page:
        print i
    print "----"

print "skipped first page: " , list(page1)   

I'm looking for some feedback and have the following questions:

  1. Is there a pager already in itertools that serves a pager that I'm overlooking?
  2. Cloning self.iter 3 times seems kludgy to me. One clone is to check whether self.iter has any more items. I decided to go with a technique Alex Martelli suggested (aware that he wrote of a wrapping technique). The second clone was to enable the returned page to be independent of the internal iterator (self.iter). Is there a way to avoid making 3 clones?
  3. Is there a better way to deal with the StopIteration exception beside catching it and then raising it again? I am tempted to not catch it at all and let it bubble up.

Thanks! -Raymond

Community
  • 1
  • 1
Raymond Yee
  • 549
  • 5
  • 13
  • 1
    Related: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks http://stackoverflow.com/questions/1335392/iteration-over-list-slices http://stackoverflow.com/questions/760753/iterate-over-a-python-sequence-in-multiples-of-n – jfs Feb 27 '10 at 18:55

6 Answers6

8

Look at grouper(), from the itertools recipes.

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)
Liz Av
  • 2,864
  • 1
  • 25
  • 35
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Thanks for pointing out the recipes. I can see using grouper because it's efficient and adapting the recipe to behave exactly like my Pager. I'm still curious as to whether Pager as it stands has much merit -- or I should abandon it for a grouper-like approach. – Raymond Yee Feb 27 '10 at 18:15
4

Why aren't you using this?

def grouper( page_size, iterable ):
    page= []
    for item in iterable:
        page.append( item )
        if len(page) == page_size:
            yield page
            page= []
    yield page

"Each page would itself be an iterator with up to page_size" items. Each page is a simple list of items, which is iterable. You could use yield iter(page) to yield the iterator instead of the object, but I don't see how that improves anything.

It throws a standard StopIteration at the end.

What more would you want?

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Thanks for answering my question and providing a good way to think about how to just loop through the iterator. I think that there is a small error -- did you mean to append the item to the page -- as in: def grouper(page_size,iterable): page= [] for item in iterable: if len(page) == page_size: yield page page= [] else: page.append(item) yield page – Raymond Yee Feb 28 '10 at 16:31
  • @raymondyee: Actually, there's a better way. Your version harbors a big. Try and see that it skips an item. – S.Lott Feb 28 '10 at 16:43
  • @S.Lott -- yes, of course, I put my page.append(item) in the wrong place. Thanks for the correction. I'm still learning about when itertools can help and when there's no need for it. Any guidelines to offer? – Raymond Yee Feb 28 '10 at 16:49
  • @raymondyee: No advice. I don't use iterools all that often. Generator functions are very simple. – S.Lott Feb 28 '10 at 17:29
3

I'd do it like this:

def pager(iterable, page_size):
    args = [iter(iterable)] * page_size
    fillvalue = object()
    for group in izip_longest(fillvalue=fillvalue, *args):
        yield (elem for elem in group if elem is not fillvalue)

That way, None can be a legitimate value that the iterator spits out. Only the single object fillvalue filtered out, and it cannot possibly be an element of the iterable.

Matt Anderson
  • 19,311
  • 11
  • 41
  • 57
  • Thanks, Matt. You made me realize that I was both not allowing for None to be a legit value from the iterator and I was not accounting for the fillvalue. – Raymond Yee Feb 28 '10 at 00:52
0
def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist
Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
0

Based on the pointer to the itertools recipe for grouper(), I came up with the following adaption of grouper() to mimic Pager. I wanted to filter out any None results and wanted to return an iterator rather than a tuple (though I suspect that there might be little advantage in doing this conversion)

# based on http://docs.python.org/library/itertools.html#recipes
def grouper2(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    for item in izip_longest(fillvalue=fillvalue, *args):
        yield iter(filter(None,item))

I'd welcome feedback on how what I can do to improve this code.

Raymond Yee
  • 549
  • 5
  • 13
0

more_itertools.chunked will do exactly what you're looking for:

>>> import more_itertools
>>> list(chunked([1, 2, 3, 4, 5, 6], 3))
[[1, 2, 3], [4, 5, 6]]

If you want the chunking without creating temporary lists, you can use more_itertools.ichunked.

That library also has lots of other nice options for efficiently grouping, windowing, slicing, etc.

Jake Biesinger
  • 5,538
  • 2
  • 23
  • 25