152

In pre-historic times (Python 1.4) we did:

fp = open('filename.txt')
while 1:
    line = fp.readline()
    if not line:
        break
    print(line)

after Python 2.1, we did:

for line in open('filename.txt').xreadlines():
    print(line)

before we got the convenient iterator protocol in Python 2.3, and could do:

for line in open('filename.txt'):
    print(line)

I've seen some examples using the more verbose:

with open('filename.txt') as fp:
    for line in fp:
        print(line)

is this the preferred method going forwards?

[edit] I get that the with statement ensures closing of the file... but why isn't that included in the iterator protocol for file objects?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
thebjorn
  • 26,297
  • 11
  • 96
  • 138
  • 5
    imho, the last suggestion is no more verbose than the one before. It just does more work (ensures the file gets closed when you're done). – azhrei Jul 19 '12 at 07:01
  • 1
    @azhrei it's one line more, so objectively it is more verbose. – thebjorn Jul 19 '12 at 07:05
  • 8
    I get what you're saying but I'm just saying comparing apples with apples, the second last suggestion in your post needs exception handling code as well to match what the last option does. So in practice it's more verbose. I guess it depends on context which of the last two options is best, really. – azhrei Jul 19 '12 at 07:07
  • I found this to be a clearly better version of the question than the more popular canonicals. The question doesn't include a strange example (trying to create a cartesian product of lines in the file, which comes with several of its own problems), and has answers that are both properly scoped and give a solid explanation. It also happens to show working code up front, which makes it easier to use for people who get here from a search engine. – Karl Knechtel Aug 09 '22 at 01:29
  • @KarlKnechtel The added version ranges (inside the code samples) don't add anything of use over just when the feature was introduced. It is assumed that if something is introduced in a version it is the intended way forwards. The title says that this question is about reading line-by-line and not reading binary files (which is very different). If you want to edit the last example to use the print-function rather than the print-statement that would seem appropriate - having two versions that only differ in whether there are parens around print doesn't seem clarifying. – thebjorn Aug 09 '22 at 18:08
  • @KarlKnechtel ...actually, all the prints could be converted to the parenthesis form since that will work for py2 as well in this case (and might be less confusing for future readers). – thebjorn Aug 09 '22 at 18:12

3 Answers3

249

There is exactly one reason why the following is preferred:

with open('filename.txt') as fp:
    for line in fp:
        print(line)

We are all spoiled by CPython's relatively deterministic reference-counting scheme for garbage collection. Other, hypothetical implementations of Python will not necessarily close the file "quickly enough" without the with block if they use some other scheme to reclaim memory.

In such an implementation, you might get a "too many files open" error from the OS if your code opens files faster than the garbage collector calls finalizers on orphaned file handles. The usual workaround is to trigger the GC immediately, but this is a nasty hack and it has to be done by every function that could encounter the error, including those in libraries. What a nightmare.

Or you could just use the with block.

Bonus Question

(Stop reading now if are only interested in the objective aspects of the question.)

Why isn't that included in the iterator protocol for file objects?

This is a subjective question about API design, so I have a subjective answer in two parts.

On a gut level, this feels wrong, because it makes iterator protocol do two separate things—iterate over lines and close the file handle—and it's often a bad idea to make a simple-looking function do two actions. In this case, it feels especially bad because iterators relate in a quasi-functional, value-based way to the contents of a file, but managing file handles is a completely separate task. Squashing both, invisibly, into one action, is surprising to humans who read the code and makes it more difficult to reason about program behavior.

Other languages have essentially come to the same conclusion. Haskell briefly flirted with so-called "lazy IO" which allows you to iterate over a file and have it automatically closed when you get to the end of the stream, but it's almost universally discouraged to use lazy IO in Haskell these days, and Haskell users have mostly moved to more explicit resource management like Conduit which behaves more like the with block in Python.

On a technical level, there are some things you may want to do with a file handle in Python which would not work as well if iteration closed the file handle. For example, suppose I need to iterate over the file twice:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

While this is a less common use case, consider the fact that I might have just added the three lines of code at the bottom to an existing code base which originally had the top three lines. If iteration closed the file, I wouldn't be able to do that. So keeping iteration and resource management separate makes it easier to compose chunks of code into a larger, working Python program.

Composability is one of the most important usability features of a language or API.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • 1
    +1 because it explains the "when" in my comment on the op ;-) – azhrei Jul 19 '12 at 07:04
  • even with alternate implementation, the with handler is only going to cause you problems for programs that opens hundreds of files in a very quick successions. Most programs can get by with the dangling file reference with no issue. Unless you disable it, eventually the GC will kick in sometime and clear up the file handle. `with` does give you the peace of mind though, so it's still a best practice. – Lie Ryan Jul 19 '12 at 07:12
  • @LieRyan: Dangling references are an unrelated concept. – Dietrich Epp Jul 19 '12 at 07:21
  • 1
    @DietrichEpp: perhaps "dangling file reference" were not the right words, I really meant file handles that were no longer accessible but not closed yet. In any case, the GC will close the file handle when it collects the file object, therefore as long as you don't have extra references to the file object and you don't disable GC and you're not opening many files in quick succession, you're unlikely to get "too many files open" due to not closing the file. – Lie Ryan Jul 19 '12 at 07:59
  • 1
    Yes, that's exactly what I mean by "if your code opens files faster than the garbage collector calls finalizers on orphaned file handles". – Dietrich Epp Jul 19 '12 at 12:28
  • 1
    The bigger reason to use with is that if you don't close the file, it won't necessarily get written immediately. – Antimony Apr 02 '14 at 21:45
25

Yes,

with open('filename.txt') as fp:
    for line in fp:
        print(line)

is the way to go.

It is not more verbose. It is more safe.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
eumiro
  • 207,213
  • 34
  • 299
  • 261
5

if you're turned off by the extra line, you can use a wrapper function like so:

def with_iter(iterable):
    with iterable as iter:
        for item in iter:
            yield item

for line in with_iter(open('...')):
    ...

in Python 3.3, the yield from statement would make this even shorter:

def with_iter(iterable):
    with iterable as iter:
        yield from iter
Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
  • 2
    call the function xreadlines.. and put it in a file named xreadlines.py and we're back to Python 2.1 syntax :-) – thebjorn Jul 19 '12 at 11:42
  • 1
    @thebjorn: perhaps, but the Python 2.1 example you cited were not safe from unclosed file handler in alternate implementations. A Python 2.1 file reading that is safe from unclosed file handler would take at least 5 lines. – Lie Ryan Jul 19 '12 at 12:33