Python: End of File when writing a parser that reads multiple lines at a go

Question

I'm writing a parser object and I'd like to understand best practice for indicating the end of the file. its seems like the code should look like this.

myFp = newParser(filename)  # package opens a file pointer and scans through multiple lines to get an entry
while entry in myFp:
      yourCode(entry)

I see that raising an exception is better than returning a status call but this case seems to be handled differently.

Since the parser is reading multiple lines I can't just send the result of readline() back up the code chain, so the while loop runs infinitely in my current implementation.

def read(this):
     entry = ''
     while 1:
        line = this.fp.readline()
        if line = '\\':
            return entry
        else:
            entry += line
     return entry

Can someone show me how the object is structured on reading so that the while loop exits in this scenario?

You should use [`StopIteration`](https://docs.python.org/2/library/exceptions.html#exceptions.StopIteration) (see [iterator types](https://docs.python.org/2/library/stdtypes.html#typeiter)), but you should also look into [context managers](https://docs.python.org/2/reference/datamodel.html#context-managers): `with newParser(filename) as myFp: for entry in myFp: ...` — jonrsharpe, Jun 01 '14 at 14:23
This is 100% a use case for a context manager combined with a generator. — aruisdante, Jun 01 '14 at 14:32
FWIW: `def read(this): return ''.join(iter(iter(this.fp).__next__, '\\'))`. Also, don't use `+=` on variable length immutable containers like this. You're asking for trouble. — Veedrac, Jun 01 '14 at 14:36
the comments sound useful but my python isn't good enough to implement from them. — shigeta, Jun 01 '14 at 14:41

aruisdante · Answer 1 · 2014-06-01T15:15:49.470

So here is a greatly simplified example, since you haven't really indicated exactly what your parser does. But it should at least help you with the general concept. First, let's build a generator that will yield tokens as it iterates across the file.

In this simplified example, I'm going to assume that each token is contained on a single line, and the line contains only one token, but you can probably deduce how you could expand it to allow for those constraints to not be true.

def produce_token(line):
    # produce a token from a line here
    return token

def tokenizer(file_to_tokenize):
    # will iterate over a file, producing a token from each line
    for line in file_to_tokenize:
        # If a token isn't constrained to a line, you don't
        # HAVE to yield every iteration. You can also yield
        # more than once per iteration
        yield produce_token(line)

Next, let's produce a contextmanager that will allow us to automagically produce a tokenizer from a file name, and will handle closing it at the end:

@contextmanager
def tokenize_file(filename):
    with open(filename) as f:
        yield tokenizer(f)

And here is how you'd use it:

filename = 'tokens.txt'
with tokenize_file(filename) as tokens:
    for token in tokens:
        # do something with token

Hopefully that gets you pointed in the right direction. Obviously in my toy example, it's so simple that there isn't really much benefit over just directly iterating over lines (it would be much faster to simply to [produce_token(line) for line in token_file]). But if your tokenization procedure is more complex and you expand it as such, this can make your process much simpler when you go to actually use it.

Python: End of File when writing a parser that reads multiple lines at a go

1 Answers1