3

If I load a module in Python, will it ever be garbage collected? Another way of framing this question is, where does Python keep references to Python modules? As I assume if there are no longer any references, the garbage collector will remove a module.

Here's an example I tried in the in Python interpreter:

>>> from importlib import import_module
>>> import sys
>>> import gc

>>> x = import_module('math')
>>> 'math' in sys.modules

This outputs:

True

So let's delete the reference to the module in the script.

>>> del x
>>> gc.collect()
>>> 'math' in sys.modules

Python still keeps track of the math module, as the output is still:

True

But now if I delete math from sys.modules, I no longer am aware of any further references:

>>> del sys.modules['math']
>>> gc.collect()

Howver, the output of gc.collect() is:

0

Nothing was garbage collected, so the module is no longer in sys.modules or my script. Why was it not garbage collected?

Ninjakannon
  • 3,751
  • 7
  • 53
  • 76
  • 6
    How do you know it wasn't garbage collected? Sure, it wasn't collected by your `gc.collect()` call, but that's probably because it was collected right after you removed it from `sys.modules`. – Aran-Fey Mar 07 '18 at 10:56
  • Aran-Fey is right. Assuming you’re using CPython, it uses refcounting, so as soon as there are no references, the object goes away. The `gc.collect` call is only needed to handle cases where there are references, but only in a cycle between otherwise-unreferenced objects. – abarnert Mar 07 '18 at 11:00
  • What is the value of `sys.getrefcount(x)` to begin with? – cs95 Mar 07 '18 at 11:01
  • 1
    By the way, I think in 3.4+ you can test this by creating a subclass of ModuleType, adding a `__del__` method to it that just prints “I’m being deleted”, and then writing a trivial .py module that sets its `__class__` to that type, then import that instead of `math`. (Also, `math` is a weird one to test in the first place, as it’s not only a C API module, but also a core module that may be part of the startup bootstrap.) – abarnert Mar 07 '18 at 11:05
  • 1
    Curiously, the output of `gc.collect()` is still 0 even if you disable automatic garbage collection with `gc.disable()`. – Aran-Fey Mar 07 '18 at 11:06
  • 2
    @Aran-Fey Disabling automatic garbage collection only disables the periodic run of the cycle detector. Refs are still counted, and anything that goes to 0 refs is still deleted immediately. – abarnert Mar 07 '18 at 11:08
  • `sys.getrefcount(x)` is 48 straight after importing the module. That's higher than I expected! – Ninjakannon Mar 07 '18 at 11:09
  • https://stackoverflow.com/a/33398553/1319284 sys.get_referrers gets all the references to an object according to this answer – kutschkem Mar 07 '18 at 11:11
  • @Ninjakannon This may be because you’re doing it in the REPL and you end up with things like `_` being temporarily assigned to `math` and other things picking up references from there. Or it may be because each of the module globals is a builtin method wrapper object with the module as its `_self` member. Or it may be because Python is using the math module itself, and all you did by importing it is bump it from 46 reps to 48. Or... – abarnert Mar 07 '18 at 16:33
  • @kutschkem It's `gc.get_referrers`, by the way, not `sys`. – Ninjakannon Mar 07 '18 at 16:39

2 Answers2

4

In general, at least in 3.4 and later, module objects shouldn’t be anything special in this regard. Of course normally there’s a reference to every loaded module in sys.modules, but if you’ve explicitly deleted that, a module should be able to go away.

That being said, there have definitely been problems in the past that prevent that from happening in some cases, and I wouldn’t promise that there aren’t any such problems left as of 3.7.

Unfortunately, your test is not actually testing anything. Presumably you’re using CPython. In CPython, the garbage collector uses reference counting—it stores a count directly on each object, incrementing and decrementing count every time a new name is bound to it, and immediately deleting it if the count goes to 0. The thing in the gc module is a cycle collector, which is needed to handle some special cases where two (or more) objects refer to each other but nobody else refers to them. If the module isn’t part of such a cycle, it’ll be deleted before you call gc.collect(), so of course that will return 0. But that 0 tells you nothing.

There are other problems with your test.

First, you should not test garbage in the interactive interpreter. All kinds of extra stuff gets kept around there, in ways that are complicated to explain. It’s much better to write a test script.

Second, you shouldn’t be using math as your test. It’s an extension module (that is, written in C rather than Python), and even after the major changes in 3.5, they still don’t work the same. It’s also a core module that may be part of startup or otherwise needed by other parts of the interpreter, even if you aren’t referencing it from your code. So, far better to use something else.

Anyway, I think there may be a way to test this directly, without using the debugger, but no promises on whether it’ll work.

First, you need to create a subclass of types.ModuleType, which has a __del__ method that prints out some message. Then, you just need to import a module (a .py one, not an extension module) and set its __class__ to that subclass. Which may be as simple as __class__ = MyModuleSubclass in the .py file. Now, when it gets collected, its destructor will run, and you’ll have proof that it was collected. (Well, proof that it was collected unless the destructor revived it, but if your destructor doesn’t do anything but print a static string, that hopefully isn’t a worry.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks, this was a useful answer because it addressed the misunderstandings I had in my question and led to a solution. I will also add to my question the short example that I wrote up based on your answer. – Ninjakannon Mar 07 '18 at 16:26
3

Based on the answer from abarnert, I created the following run-it-yourself example that demonstrates the behaviour I was trying to understand:

from types import ModuleType
from importlib import import_module
import sys

class MyModule(ModuleType):
    def __del__(self):
        print('I am being deleted')

if __name__ == '__main__':
    x = import_module('urllib3')
    x.__class__ = MyModule
    del x
    del sys.modules['urllib3'] # Comment this out and urllib3 will NOT be garbage collected before the script finishes
    print('finishing')

Output when run as is:

I am being deleted

finishing

Output with the del sys.modules['urllib3'] line commented out:

finishing

I am being deleted

It is clear that modules are garbage collected as one would expect when all references to them have been deleted, and that unless the module in question is somewhat particular, this occurs when references in the application and in sys.modules have been deleted.

Community
  • 1
  • 1
Ninjakannon
  • 3,751
  • 7
  • 53
  • 76
  • 1
    Nice to know my “this should work...” actually did, and good job explaining what the results mean in a way that future searchers with the same question will find this all useful. – abarnert Mar 07 '18 at 16:37