2

I'm trying to implement a clean-up routine in a utility module I have. In looking around for solutions to my problem, I finally settled on using a weakref callback to do my cleanup. However, I'm concerned that it won't work as expected because of a strong reference to the object from within the same module. To illustrate:

foo_lib.py

class Foo(object):

    _refs = {}

    def __init__(self, x):

        self.x = x
        self._weak_self = weakref.ref(self, Foo._clean)
        Foo._refs[self._weak_self] = x

    @classmethod
    def _clean(cls, ref):

        print 'cleaned %s' % cls._refs[ref]

foo = Foo()

Other classes then reference foo_lib.foo. I did find an old document from 1.5.1 that sort of references my concerns (http://www.python.org/doc/essays/cleanup/) but nothing that makes me fully comfortable that foo will be released in such a way that the callback will be triggered reliably. Can anyone point me towards some docs that would clear this question up for me?

Silas Ray
  • 25,682
  • 5
  • 48
  • 63
  • I don't think you want to be relying on weakref cleanup (or, equivalently, `__del__`) for whatever you're really doing, and it's hard to give a solid answer… but it's definitely an interesting question, and thanks for making me look at what interpreter finalization actually does, because there's some interesting stuff there as well. – abarnert Aug 12 '13 at 22:30

2 Answers2

1

Python modules are cleaned up when exiting, and any __del__ methods probably are called:

It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.

Names starting with an underscore are cleared first:

Starting with version 1.5, Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the __del__() method is called.

Weak reference callbacks rely on the same mechanisms as __del__ methods do; the C deallocation functions (type->tp_dealloc).

The foo instance will retain a reference to the Foo._clean class method, but the global name Foo could be cleared already (it is assigned None in CPython); your method should be safe as it never refers to Foo once the callback has been registered.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I specifically avoid `__del__`because of all the caveats. I came across http://code.activestate.com/recipes/519621-object-finalization-without-__del__-and-without-ha/ which made me think I was going in the right direction with `weakref`, but as I said, couldn't then track down any clarifying information on when the reference is released. Is a `weakref` callback really just as reliable as `__del__`? – Silas Ray Aug 12 '13 at 21:46
  • The same mechanisms call the weakref callback as would call `__del__`. – Martijn Pieters Aug 12 '13 at 21:50
  • Where do the docs guarantee that modules are actually cleaned up when exiting? That would be a useful link to have. – abarnert Aug 12 '13 at 21:53
  • @abarnert: I haven't found anything yet, other than the `object.__del__` documentation talking about `_` single-underscore names being cleared first. – Martijn Pieters Aug 12 '13 at 21:55
  • Hmm. Well, that's unfortunate. So what I'm actually trying to do is close out a subprocess that is normally kept open by a timer, but needs to be nuked when the program exits. Is the only really "reliable" way to do this to start a daemonic subprocess to monitor and kill the other process separately? – Silas Ray Aug 12 '13 at 21:56
  • Also… Python modules definitely _are_ cleaned up when their last reference goes away, just like any other value. Of course it can be tricky to know where all the references are, but if you save [this](http://pastebin.com/sdvsQb2i) as `dumb.py` and [this](http://pastebin.com/DLw6YLk2) as `dumber.py` and run the latter, you'll see that in simple cases it works as you'd expect. – abarnert Aug 12 '13 at 21:57
  • Sure, modules are cleaned up, but where this is *documented* is not clear. :-P – Martijn Pieters Aug 12 '13 at 22:00
  • Actually, I think the bit about `__del__` at exit may imply that they're _not_ cleaned up. If they were, there would be no objects alive at shutdown (or, at worst, only objects explicitly crammed into builtins), so there would be no issue about whether such nonexistent objects were deleted. – abarnert Aug 12 '13 at 22:08
  • OK, [`Py_Finalize`](http://docs.python.org/3.3/c-api/init.html#Py_Initialize) claims that all live objects, including modules, except for extension modules, are destroyed in arbitrary order, and their `__del__` methods called if there are no circular references. So, the bit about modules being cleaned up isn't directly relevant; the object may get deleted first, or its module may get deleted, freeing the last reference to the object, causing it to get deleted, or some other module or object may get deleted, freeing the last reference to the object, etc. – abarnert Aug 12 '13 at 22:13
1

The right thing to do here is to explicitly release your strong reference at some point, rather than counting on shutdown to do it.

In particular, if the module is released, its globals will be released… but it doesn't seem to be documented anywhere that the module will get released. So, there may still be a reference to your object at shutdown. And, as Martijn Pieters pointed out:

It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.

However, if you can ensure that there are no (non-weak) references to your object some time before the interpreter exits, you can guarantee that your cleanup runs.

You can use atexit handlers to explicitly clean up after yourself, but you can just do it explicitly before falling off the end of your main module (or calling sys.exit, or finishing your last non-daemon thread, or whatever). The simplest thing to do is often to take your entire main function and wrap it in a with or try/finally.

Or, even more simply, don't try to put cleanup code into __del__ methods or weakref callbacks; just put the cleanup code itself into your with or finally or atexit.


In a comment on another answer:

what I'm actually trying to do is close out a subprocess that is normally kept open by a timer, but needs to be nuked when the program exits. Is the only really "reliable" way to do this to start a daemonic subprocess to monitor and kill the other process separately?

The usual way to do this kind of thing is to replace the timer with something signalable from outside. Without knowing your app architecture and what kind of timer you're using (e.g., a single-threaded async server where the reactor kicks the timer vs. a single-threaded async GUI app where an OS timer message kicks the timer vs. a multi-threaded app where the timer is just a thread that sleeps between intervals vs. …), it's hard to explain more specifically.

Meanwhile, you may also want to look at whether there's a simpler way to handle your subprocesses. For example, maybe using an explicit process group, and killing your process group instead of your process (which will kill all of the children, on both Windows and Unix… although the details are very different)? Or maybe give the subprocess a pipe and have it quit when the other end of the pipe goes down?


Note that the documentation also gives you no guarantees about the order in which left-over references are deleted, if they are. In fact, if you're using CPython, Py_Finalize specifically says that it's "done in random order".

The source is interesting. It's obviously not explicitly randomized, and it's not even entirely arbitrary. First it does GC collect until nothing is left, then it finalizes the GC itself, then it does a PyImport_Cleanup (which is basically just sys.modules.clear()), then there's another collect commented out (with some discussion as to why), and finally a _PyImport_Fini (which is defined only as "For internal use only").

But this means that, assuming your module really is holding the only (non-weak) reference(s) to your object, and there are no unbreakable cycles involving the module itself, your module will get cleaned up at shutdown, which will drop the last reference to your object, causing it to get cleaned up as well. (Of course you cannot count on anything other than builtins, extension modules, and things you have a direct reference to still existing at this point… but your code above should be fine, because foo can't be cleaned up before Foo, and it doesn't rely on any other non-builtins.)

Keep in mind that this is CPython-specific—and in fact CPython 3.3-specific; you will want to read the relevant equivalent source for your version to be sure. Again, the documentation explicitly says things get deleted "in random order", so that's what you have to expect if you don't want to rely on implementation-specific behavior.


Of course your cleanup code still isn't guaranteed to be called. For example, an unhandled signal (on Unix) or structured exception (on Windows) will kill the interpreter without giving it a chance to clean up anything. And even if you write handlers for that, someone could always pull the power cord. So, if you need a completely robust design, you need to be interruptable without cleanup at any point (by journaling, using atomic file operations, protocols with explicit acknowledgement, etc.).

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • What this is actually doing in my code is managing a performance optimization in a test framework. I have a virtual display manager that starts and stops a single virtual display instance across multiple tests, and keeps track of other code reserving and releasing the display. The timer kills the virtual display after 4 minutes of having no users reserving the display with a simple timer thread. However, if it hasn't yet been 4 minutes when nose exits the test suite, the manager leaves the display active (if monitor is a daemon) or holds the process open till the thread times out. – Silas Ray Aug 13 '13 at 00:46
  • I was actually looking at `atexit` at first, but thought it'd be nice to be more clever about it. Maybe I'll just be less clever. :) – Silas Ray Aug 13 '13 at 00:47
  • The 'random' order stems from the fact that cleanup ordering is tied to dictionaries; with 3.3 random hashing was introduced making it that little bit more random. See http://stackoverflow.com/questions/18163697/exception-typeerror-warning-sometimes-shown-sometimes-not-when-using-throw-meth for an example where that randomisation led to 'weird' behaviour. – Martijn Pieters Aug 13 '13 at 07:28
  • @MartijnPieters: That makes sense… except that the bit about "random order" was there in the docs for 2.6 through 3.2 as well. I suspect it's just a CYA term meant to discourage you from relying on any order that you happen to discover. (There have been changes in previous versions, even in bugfix releases, that would change that order.) But you're right that most of the cleanup is driven by the most of `sys.modules` being cleaned up in dict-iteration order, which is more random in 3.3… – abarnert Aug 13 '13 at 19:06