3

Traditional Python teaching says that if you want to loop over the lines of a file, you should open it in a context manager like:

with open(filename) as f:
    for line in f:
        do_something_with(line)

Recently, I heard someone make an argument for doing the same thing with the following paradigm:

for line in open(filename):
    do_something_with(line)

In CPython, I believe that the file object created by open(filename) will be disposed of after no references to it remain. That should happen once the for loop stops executing (i.e., it naturally finishes, an error occurs, it encounters a break statement, etc.).

Indeed, on my machine, running the following code

import weakref

def wrap_file_io(obj):
   global w
   w = weakref.ref(obj, lambda c: print("The file object has been disposed of."))
   return obj

def read_file():
    print('Reading file...')
    for line in wrap_file_io(open('file.txt')):
        print(line.strip())

    print('Doing some other expensive operation...')
    for i in range(1000):
        for j in range(1000):
            i ** j
    print("Expensive operation complete.")

if __name__ == '__main__':
    read_file()

produces the output

Reading file...
These are the contents
of file.txt.
There are three lines.
The file object has been disposed of.
Doing some other expensive operation...
Expensive operation complete.

Which indicates that the file is cleared away before the expensive operation.

Opening a file like this makes me kind of nervous, but given the strength of CPython's object deallocation implementation, are there problems with opening a file with for line in open(filename)? Can you provide a specific piece of code that might be problematic without the context manager?

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 1
    In fact its disposed of before garbage collection because its refcount goes to zero. The problem is partially with other implementations that garbage collect in the background without refcounting (file may stay open too long) and issues python has with destroying objects (`__del__` isn't always reliably called). However, I open file handles outside of `with` blocks and have never been burned. – tdelaney Jun 11 '20 at 00:11
  • @tdelaney, that's fascinating. Can you elaborate on issues that python has with destroying objects? Why doesn't "disposing" of an object include calling `__del__`? – pythonista912 Jun 11 '20 at 00:15
  • @pythonista912 there are edge cases where it might not happen, for example, for objects that are still alive when the interpreter exists. In any case, resource leaks are really a problem for long-running programs, like some sort of server. For a script that is meant to run and exit then it doesn't really matter, when the process is terminated, these resources are reclaimed by the OS. – juanpa.arrivillaga Jun 11 '20 at 00:23
  • 1
    Disposed of (as in garbage collected) would normally not be entirely deterministic: once reference count drops to zero, the object **can** be garbage collected, does not however have to be immediately garbage collected. Closing file(s) (handles) when done is a practice not specific to Python. It is a limited resource (and can configured to be even more limited om per system / user basis) – Ondrej K. Jun 11 '20 at 00:34
  • I consider using them mainly a convenience since doing so makes sure the file gets closed no matter how the following block of code is exited (which is important because it makes sure any data put into the file's internal buffer gets written out to the physical file when outputting to one). In addition, since the number of open files allowed at one time on a system is a limited, it's also a good way—regardless of whether you're using CPython or not—of ensuring that limit isn't exceeded…although in practice I've found that to seldom be a issue, but your own mileage may vary. – martineau Jun 11 '20 at 01:14
  • 1
    @pythonista912 - A couple of cases are when an exception is raised in `__init__` especially when using a custom `__new__` - not really a problem for file objects, but generally true for classes, and when a function exits with an exception and the object is still in the stack frame (so the refcount isn't zero). There are other exception-in-an-exception cases I believe. This is all from memory, though. – tdelaney Jun 11 '20 at 03:14

0 Answers0