Here you go, tested code. Uses while True:
to loop, and lets itertools.takewhile()
do everything with lines
. When itertools.takewhile()
reaches the end of input, it returns an iterator that does nothing except raise StopIteration
, which list()
simply turns into an empty list, so a simple if not block:
test detects the empty list and breaks out of the loop.
import itertools
def not_tabline(line):
return '\t' != line.rstrip('\n')
def block_generator(file):
with open(file) as lines:
while True:
block = list(itertools.takewhile(not_tabline, lines))
if not block:
break
yield block
for block in block_generator("test.txt"):
print "BLOCK:"
print block
As noted in a comment below, this has one flaw: if the input text has two lines in a row with just the tab character, this loop will stop processing without reading all the input text. And I cannot think of any way to handle this cleanly; it's really unfortunate that the iterator you get back from itertools.takewhile()
uses StopIteration
both as the marker for the end of a group and as what you get at end-of-file. To make it worse, I cannot find any way to ask a file iterator object whether it has reached end-of-file or not. And to make it even worse, itertools.takewhile()
seems to advance the file iterator to end-of-file instantly; when I tried to rewrite the above to check on our progress using lines.tell()
it was already at end-of-file after the first group.
I suggest using the itertools.groupby()
solution. It's cleaner.