I have a text file like this:
11
2
3
4
11
111
Using Python 2.7, I want to turn it into a list of lists of lines, where line breaks divide items in the inner list and empty lines divide items in the outer list. Like so:
[["11","2","3","4"],["11"],["111"]]
And for this purpose, I wrote a generator function that would yield the inner lists one at a time once passed an open file object:
def readParag(fileObj):
currentParag = []
for line in fileObj:
stripped = line.rstrip()
if len(stripped) > 0: currentParag.append(stripped)
elif len(currentParag) > 0:
yield currentParag
currentParag = []
That works fine, and I can call it from within a list comprehension, producing the desired result. However, it subsequently occurred to me that I might be able to do the same thing more concisely using itertools.takewhile
(with a view to rewriting the generator function as a generator expression, but we'll leave that for now). This is what I tried:
from itertools import takewhile
def readParag(fileObj):
yield [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
In this case, the resulting generator yields only one result (the expected first one, i.e. ["11","2","3","4"]
). I had hoped that calling its next
method again would cause it to evaluate takewhile(lambda line: line != "\n", fileObj)
again on the remainder of the file, thus leading it to yield another list. But no: I got a StopIteration
instead. So I surmised that the take while
expression was being evaluated once only, at the time when the generator object was created, and not each time I called the resultant generator object's next
method.
This supposition made me wonder what would happen if I called the generator function again. The result was that it created a new generator object that also yielded a single result (the expected second one, i.e. ["11"]
) before throwing a StopIteration
back at me. So in fact, writing this as a generator function effectively gives the same result as if I'd written it as an ordinary function and return
ed the list instead of yield
ing it.
I guess I could solve this problem by creating my own class to use instead of a generator (as in John Millikin's answer to this question). But the point is that I was hoping to write something more concise than my original generator function (possibly even a generator expression). Can somebody tell me what I'm doing wrong, and how to get it right?