I'm looking for a nice way to write a generator that takes a stream of items from another list / generator / iterable and groups them.
Splitting items is easy. For example, if we want to take lines of a file and split them into characters:
def lines2chars(filename):
with open(filename) as fh:
for line in fh: # Iterate over items
for char in line: # Split items up
yield char # Yield smaller items
Grouping them, to produce paragraphs for example, seems tricky. This is what I've come up with:
def lines2para(filename):
with fh as open(filename):
paragraph = [] # Start with an empty group
while True: # Infinite loop to be ended by exception
try:
line = next(fh) # Get a line
except StopIteration as e:
# If there isn't one...
# do whatever necessary
raise # and raise StopIteration for the caller
else:
paragraph.append(line) # Add to the group of items
if line == "\n": # If we've got a whole group
yield paragraph # yield it
paragraph = [] # and start a new group
It's not pretty in my opinion. It's using the internals of the iteration protocol, has an infinite loop that's broken out of, and just doesn't read well to me. So has anyone got a nicer way of writing this type of code?
Bear in mind I'm looking for the pattern, rather than this specific example. In my case I'm reading data that's split across packets which is split across packets, but each level is something similar to the paragraph example.