0

I'm looking for a nice way to write a generator that takes a stream of items from another list / generator / iterable and groups them.

Splitting items is easy. For example, if we want to take lines of a file and split them into characters:

def lines2chars(filename):

    with open(filename) as fh:

        for line in fh:                 # Iterate over items
            for char in line:           # Split items up
                yield char              # Yield smaller items

Grouping them, to produce paragraphs for example, seems tricky. This is what I've come up with:

def lines2para(filename):

    with fh as open(filename):
        paragraph = []                  # Start with an empty group

        while True:                     # Infinite loop to be ended by exception
            try:
                line = next(fh)         # Get a line
            except StopIteration as e:
                                        # If there isn't one...
                                        # do whatever necessary
                raise                   # and raise StopIteration for the caller
            else:
                paragraph.append(line)  # Add to the group of items
                if line == "\n":        # If we've got a whole group
                    yield paragraph     # yield it
                    paragraph = []      # and start a new group

It's not pretty in my opinion. It's using the internals of the iteration protocol, has an infinite loop that's broken out of, and just doesn't read well to me. So has anyone got a nicer way of writing this type of code?

Bear in mind I'm looking for the pattern, rather than this specific example. In my case I'm reading data that's split across packets which is split across packets, but each level is something similar to the paragraph example.

Paul S
  • 7,645
  • 2
  • 24
  • 36
  • 2
    what's wrong with http://docs.python.org/library/itertools.html#itertools.groupby? – luke14free Apr 12 '12 at 09:02
  • Are the answers to this question any help? http://stackoverflow.com/q/3862010/311220 – Acorn Apr 12 '12 at 09:06
  • @luke14free Hmmm, definitely works for the paragraph example, but you've made me realise that for my packeted data stream I need to keep a running count of how much data is in each packet. My example wasn't good enough. I'll update my question in a bit, but I need to get to work else I'll be even later than I already am. – Paul S Apr 12 '12 at 09:13

1 Answers1

1
import itertools as it

def lines2para(filename):
    with open(filename) as fh:
        for k, v in it.groupby(fh, key=lambda x: bool(x.strip())):
            if k:
                yield list(v)
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • You've answered my question, but I've realised my question wasn't quite what I need. I'll re-edit in a bit – Paul S Apr 12 '12 at 09:14