2

Up until now, I have been making an egregious error in my Python development: I have been assuming that streams get closed once their corresponding objects go out of scope. Specifically, I assumed that when a file or some instance of a class inheriting io.IOBase invoked __del__, it would run the object's close method. However, upon executing the following code I noticed that this was certainly not the case.

def wrap_close(old_close):
    def new_close(*args, **kwargs):
        print("closing")
        return old_close(*args, **kwargs)
    return new_close

f = open("tmp.txt")
f.close = wrap_close(f.close)
f.close() # prints "closing"

f = open("tmp.txt")
f.close = wrap_close(f.close)
del(f) # nothing printed

My question is, what is the benefit of not closing a file or stream automatically when its __del__ method is invoked? It seems like the implementation would be trivial, but I imagine there has to be a reason for allowing streams to stay open after their corresponding object goes out of scope.

James Mchugh
  • 994
  • 8
  • 26
  • The problem is that closing a file descriptor is an OS-level task that only properly gets done when you call `.close()`. `del` doesn't have any special behavior built-in to deal with streams - it just removes the object from the namespace, and then the garbage collector composts it. Neither of them care that there's an open file descriptor somewhere, because how would they know? – Green Cloak Guy Feb 14 '20 at 12:58
  • 1
    Printing the actual file descriptors (via ``f.fileno()``) shows that they are re-used, i.e. the file is closed. Note that ``.close`` is a Python level function, whereas ``_io`` is implemented in C and its classes will directly call its C functions, not the Python wrappers. – MisterMiyagi Feb 14 '20 at 13:04
  • @GreenCloakGuy But don't Python classes have the ability to determine what is done when their instances are deleted using the `__del__` magic method? Could they not invoke `close` within those methods? – James Mchugh Feb 14 '20 at 13:08
  • @MisterMiyagi So in essence, `__del__` cannot be used here since `_io` is implemented in C and `__del__` would also be a Python level function? – James Mchugh Feb 14 '20 at 13:11
  • 1
    James: If you read the documentation closely, you'll see that there's no guarantee that deleted objects will ever be garbage collected. – martineau Feb 14 '20 at 13:12
  • 1
    @JamesMchugh Are you confusing ``__del__`` with ``del`` perhaps? Python does use the equivalent of ``_io.TextIOWrapper.__del__``, but it is a C function which *doesn't* call the Python level ``f.close``. – MisterMiyagi Feb 14 '20 at 13:17
  • @martineau I think I understand that, but my question is why is that the case? In a straight Python implementation, I could implement a class that inherits `io.IOBase` and then define `__del__` to invoke `close` when the instance is deleted. Is my understanding of this incorrect? – James Mchugh Feb 14 '20 at 13:18
  • @MisterMiyagi Possibly. My understanding was that `del` invoked `__del__` similarly to how `iter` invoked `__iter__`. Maybe my understanding is flawed. – James Mchugh Feb 14 '20 at 13:21
  • Yes, you can do that, and if a deleted object of that type is ever garbage-collected, it will be called. Don't know all the reasons why except that Python isn't like C++ in this regard. If you want guaranteed deletion, you will need to implement a context manager. – martineau Feb 14 '20 at 13:23
  • 1
    @JamesMchugh ``del`` only unlinks a name. As a result, this might push the reference count down to 0 (in CPython) or eventually trigger garbage collection (any implementation). It does not directly invoke ``__del__``. – MisterMiyagi Feb 14 '20 at 13:23
  • Does this answer your question: [Does Python GC close files too?](https://stackoverflow.com/questions/49512990/does-python-gc-close-files-too) – MisterMiyagi Feb 14 '20 at 13:24
  • @MisterMiyagi I see now, that was my misunderstanding then. Thank you for clearing that up. – James Mchugh Feb 14 '20 at 13:24
  • @MisterMiyagi That appears to. I did not see that question when initially searching for an answer to my question. This could be closed as a duplicate. – James Mchugh Feb 14 '20 at 13:27
  • Your example demonstrates one common case where you can't rely on `__del__` being invoked: if the script is exiting. There's no need to worry about handling a reference count of 0 if the entire interpreter is going away. – chepner Feb 14 '20 at 13:35
  • @chepner That is a good thing to know now. I should caveat this though the interpreter was not going away in the case of my example, as I ran it in an iPython interpreter which stayed alive afterwards. Sorry for not being clear on that. I have been working with Python for a while, but I still feel that I learn new things about it everyday. Thank you all for the help. – James Mchugh Feb 14 '20 at 13:38
  • @chepner the example does not demonstrate that, the "callback" isn't invoked even if you force a full collection (and check that the object has not leaked). What it demonstrates is that you can't assume what order or state things will be collected in when you create a cycle: cpython can first `del f.close` and only then collect `f`. – Masklinn Feb 14 '20 at 13:47
  • @martineau if you create a cycle of `shared_ptr` C++ will absolutely never collect them ever. CPython may eventually collect them when it performs a GC run. That aside, deterministic destruction is part of cpyhton but not of the python specification at large (pypy doesn't use refcounting). – Masklinn Feb 14 '20 at 13:48
  • Yes, and the interpreter closing is one situation where you can't assume `__del__` will be called. Python doesn't specify anything about *how* garbage collection occurs, only that *you* don't need to worry about explicitly freeing memory. (Honestly, I think a Python implementation that simply assumes infinite memory, thus never needing to do garbage collection, would be valid, if not practical.) – chepner Feb 14 '20 at 13:53

1 Answers1

3

self.close keeps a reference on self, so you're writing self.close = lambda: self.close(), creating a circular reference. As a result:

  1. del does nothing, CPython will not reclaim the object until an actual GC collection happens, either implicitly at some point or through an explicit gc.collect()
  2. CPython has to break the cycle, so by the time it comes around to collecting the object it might have removed the attribute from it

You can see this very clearly if you replace your strong references to old_close by weak ones:

def wrap_close(old_close):
    old_close = weakref.ref(old_close)
    def new_close(*args, **kwargs):
        print("closing")
        c = old_close()
        if c:
            c(*args, **kwargs)
    return new_close

f = open('/dev/zero')
f.close = wrap_close(f.close)
f.close() # prints "closing"

f = open('/dev/zero')
f.close = wrap_close(f.close)
del f # nothing printed

prints "closing" not just twice but thrice:

  • when close() is called explicitly
  • when the first f is garbaged (as the second one is created)
  • when the second one is del'd

My question is, what is the benefit of not closing a file or stream automatically when its del method is invoked?

Python absolutely closes the file object when it's finalised.

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • A circular reference will not prevent an object from being collected. The CPython ``gc`` explicitly exists for circular references only. – MisterMiyagi Feb 14 '20 at 13:36
  • @MisterMiyagi it will not prevent an object from being collected *in the long term* but `del` will not do it (you can see that by creating a weakref to the file and checking for the weakref's liveness after the del, on a normal file the weakref is empty whereas here it's not). And here even after forcing a GC run, the "replacement method" never gets called, possibly because CPython decides to break the cycle by first unsetting the callable. – Masklinn Feb 14 '20 at 13:38
  • So in this case, the wrapper for `close` could potentially be deleted before the actual file stream. I did not take that into account. On top of that, just because `del` may reduce the pointer count of an object to 0, that does not mean it will be GC'd at that point. I think I have a better understanding now. Thank you. – James Mchugh Feb 14 '20 at 13:56
  • 1
    @JamesMchugh CPython more or less guarantees that a refcount of 0 will cause collection (note that this is specific to **cpython**, it's not generally valid) however if you have a cycle the recount is never 0, the object lives in an unreachable bubble of its own refcount. Also if you use weakrefs you can clearly see that there's no issue with the collection itself. – Masklinn Feb 14 '20 at 13:58
  • So using a weak reference removes that cycle since it doesn't protect the underlying object from GC, thus allowing the refcount to reach 0. That is super interesting. I did not even know that you could use weak references in Python. Thank you for making me aware of that. – James Mchugh Feb 14 '20 at 14:03