77

TL/DR:

import gc, sys

print len(gc.get_objects()) # 4073 objects in memory

# Attempt to unload the module

import httplib
del sys.modules["httplib"]
httplib = None

gc.collect()
print len(gc.get_objects()) # 6745 objects in memory

UPDATE I've contacted Python developers about this problem and indeed it's not going to be possible to unload a module completely "in next five years". (see the link)

Please accept that Python indeed does not support unloading modules for severe, fundamental, insurmountable, technical problems, in 2.x.


During my recent hunt for a memleak in my app, I've narrowed it down to modules, namely my inability to garbage collect an unloaded module. Using any method listed below to unload a module leaves thousands of objects in memory. In other words - I can't unload a module in Python...

The rest of the question is attempt to garbage collect a module somehow.

Let's try:

import gc
import sys

sm = sys.modules.copy()  # httplib, which we'll try to unload isn't yet 
                         # in sys.modules, so, this isn't the source of problem

print len(gc.get_objects()) # 4074 objects in memory

Let's save a copy of sys.modules to attempt to restore it later. So, this is a baseline 4074 objects. We should ideally return to this somehow.

Let's import a module:

import httplib
print len(gc.get_objects()) # 7063 objects in memory

We're up to 7K non-garbage objects. Let's try removing httplib from sys.modules.

sys.modules.pop('httplib')
gc.collect()
print len(gc.get_objects()) # 7063 objects in memory

Well, that didn't work. Hmm, but isn't there a reference in __main__? Oh, yeah:

del httplib
gc.collect()
print len(gc.get_objects()) # 6746 objects in memory

Hooray, down 300 objects. Still, no cigar, that's way more than 4000 original objects. Let's try restoring sys.modules from copy.

sys.modules = sm
gc.collect()
print len(gc.get_objects()) # 6746 objects in memory

Hmmm, well that was pointless, no change.. Maybe if we wipe out globals...

globals().clear()
import gc # we need this since gc was in globals() too
gc.collect()
print len(gc.get_objects()) # 6746 objects in memory

locals?

locals().clear()
import gc # we need this since gc was in globals() too
gc.collect()
print len(gc.get_objects()) # 6746 objects in memory

What the.. what if we imported a module inside of exec?

local_dict = {}
exec 'import httplib' in local_dict
del local_dict
gc.collect()
print len(gc.get_objects())  # back to 7063 objects in memory

Now, that's not fair, it imported it into __main__, why? It should have never left the local_dict... Argh! We back to fully imported httplib. Maybe if we replaced it with a dummy object?

from types import ModuleType
import sys
print len(gc.get_objects())  # 7064 objects in memory

Bloody.....!!

sys.modules['httplib'] = ModuleType('httplib')
print len(gc.get_objects())  # 7066 objects in memory

Die modules, die!!

import httplib
for attr in dir(httplib):
    setattr(httplib, attr, None)
gc.collect()
print len(gc.get_objects())  # 6749 objects in memory

Okay, after all attempts, the best is +2675 (nearly +50%) from starting point... That's just from one module... That doesn't even have anything big inside...

Ok, now seriously, where's my error? How do I unload a module and wipe out all of it's contents? Or is Python's modules one giant memory leak?

Full source in simpler to copy form: http://gist.github.com/450606

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Slava V
  • 16,686
  • 14
  • 60
  • 63
  • 1. Have you ever examined, one by one, what exactly are the remaining objects? 2.ou are trying to test using "python's" modules. Have you tried the same approach with a naive big module created by you? a module that does not reference anything outside itself? – ilias iliadis Nov 19 '19 at 17:08

5 Answers5

29

Python does not support unloading modules.

However, unless your program loads an unlimited number of modules over time, that's not the source of your memory leak. Modules are normally loaded once at start up and that's it. Your memory leak most likely lies elsewhere.

In the unlikely case that your program really does load an unlimited number of modules over time, you should probably redesign your program. ;-)

Daniel Stutzbach
  • 74,198
  • 17
  • 88
  • 77
  • 2
    Yes, it does load reasonably unlimited number of modules - it's a web app server that accepts new revisions of it's own source code and reloads it (that's pretty standard web task). The leak IS from the fact that old code is still out there in memory, even if replaced, even if unreachable... – Slava V Jun 23 '10 at 21:52
  • Python does support unloading modules. They're garbage collected, like every other object in Python. – Glenn Maynard Jun 23 '10 at 22:56
  • 2
    @Slava: You might want to take a look at the source code to `mod_python`, which has its own importer that is designed to handle reloading modules without producing memory leaks. There may be some code in there that you could use. – David Z Jun 23 '10 at 23:00
  • @David: Isn't `mod_python` is written in C? – Slava V Jun 23 '10 at 23:14
  • 5
    @Glenn: they are garbage collectable objects, yes. So are True and False. Can you get the reference count down to 0? Not so easy. See also: http://bit.ly/9mvndb – Daniel Stutzbach Jun 23 '10 at 23:22
  • @Daniel: as sad as it to accept - it seems that you are right about "Python does not support unloading modules". Thanks – Slava V Jun 23 '10 at 23:39
  • 2
    @Slava (3 comments up): partly, but not entirely. There is a lot of Python code in it too, including the importer. See [the source](http://svn.apache.org/viewvc/quetzalcoatl/mod_python/trunk/lib/python/mod_python/importer.py?view=markup). – David Z Jun 23 '10 at 23:42
  • I explained the reference count issue: it's tricky and usually not worth it. My problem is, as I said, that Python *does* support unloading modules. That's not a nitpick; it's an important implication of Python's modules that people should understand: module references are just like any other object references. (Yeah, native libraries are an annoying exception.) – Glenn Maynard Jun 23 '10 at 23:44
  • The `sys.modules.copy()` does create a reference, but even if I remove (or not create at all) this reference (and all others, even using `globals().clear()` and `locals().clear()` and `for key in module: setattr(module,key,None)`) - there are still thousands of stray objects in memory. That's what the question is about. I know the `sys.modules.copy()` and `sys` are references, I did remove them, the objects aren't gone. So, Daniel is right. Thanks for the attempt, though, to both of you. – Slava V Jun 23 '10 at 23:48
  • @David: great link to mod_python's importer! Thanks! – Slava V Jun 24 '10 at 00:18
10

I can not find an authoritative perspective on this in python3 (10 years later) (now python3.8). However, we can do better percentage-wise now.

import gc
import sys

the_objs = gc.get_objects()
print(len(gc.get_objects())) # 5754 objects in memory
origin_modules = set(sys.modules.keys())
import http.client # it was renamed ;)

print(len(gc.get_objects())) # 9564 objects in memory
for new_mod in set(sys.modules.keys()) - origin_modules:
    del sys.modules[new_mod]
    try:
        del globals()[new_mod]
    except KeyError:
        pass
    try:
        del locals()[new_mod]
    except KeyError:
        pass
del origin_modules
# importlib.invalidate_caches()  happens to not do anything
gc.collect()
print(len(gc.get_objects())) # 6528 objects in memory 

only increasing 13%. If you look at the kind of objects that get loaded in the new gc.get_objects, some of them are builtins, source code, random.* utilities, datetime utilities, etc. I am mostly leaving this here as an update to start the conversation for @shuttle and will delete if we can make more progress.

modesitt
  • 7,052
  • 2
  • 34
  • 64
6

I'm not sure about Python, but in other languages, calling the equivalent of gc.collect() does not release unused memory - it will only release that memory if/when the memory is actually needed.

Otherwise, it makes sense for Python to keep the modules in memory for the time being, in case they need to be loaded again.

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
  • The problem is that I need to replace them with new versions. And even when I replace it 1-to-1 with same size module - the memory usage grows (leaks)... Thanks for the suggestion, though. – Slava V Jun 23 '10 at 23:44
6

Python's small object manager rarely returns memory back to the Operating System. From here and here. So, stricktly speaking, python has (by design) a kind of memory leak, even when objects are "gc collected".

ilias iliadis
  • 601
  • 8
  • 15
0

(You should try to write more concise questions; I've only read the beginning and skimmed the rest.) I see a simple problem at the start:

sm = sys.modules.copy()

You made a copy of sys.modules, so now your copy has a reference to the module--so of course it won't be collected. You can see what's referring to it with gc.get_referrers.

This works fine:

# module1.py
class test(object):
    def __del__(self):
        print "unloaded module1"
a = test()

print "loaded module1"

.

# testing.py
def run():
    print "importing module1"
    import module1
    print "finished importing module1"

def main():
    run()
    import sys
    del sys.modules["module1"]
    print "finished"

if __name__ == '__main__':
    main()

module1 is unloaded as soon as we remove it from sys.modules, because there are no remaining references to the module. (Doing module1 = None after importing would work, too--I just put the import in another function for clarity. All you have to do is drop your references to it.)

Now, it's a little tricky to do this in practice, because of two issues:

  • In order to collect a module, all references to the module must be unreachable (as with collecting any object). That means that any other modules that imported it need to be dereferenced and reloaded, too.
  • If you remove a module from sys.modules when it's still referenced somewhere else, you've created an unusual situation: the module is still loaded and used by code, but the module loader doesn't know about it anymore. The next time you import the module, you won't get a reference to the existing one (since you deleted the record of that), so it'll load a second coexisting copy of the module. This can cause serious consistency problems. So, make sure that there are no remaining references to the module before finally removing it from sys.modules.

There are some tricky problems to use this in general: detecting which modules depend on the module you're unloading; knowing whether it's okay to unload those too (depends heavily on your use case); handling threading while examining all this (take a look at imp.acquire_lock), and so on.

I could contrive a case where doing this might be useful, but most of the time I'd recommend just restarting the app if its code changes. You'll probably just give yourself headaches.

Glenn Maynard
  • 55,829
  • 10
  • 121
  • 131
  • 11
    Well, not to be snyde, but you should've read the question, or at least the word "completely" in title (or at least the tags). The problem isn't that I wan't to reload, the problem is *memory leak* associated with any (listed) kind of module removal (including ones that you proposed, which *are* listed in my question, along with dozen others). Actually I added `sys.modules.copy()` at a very late stage, removing it doesn't change anything (try yourself). – Slava V Jun 23 '10 at 23:08
  • 1
    The source, to try: http://gist.github.com/450606 . Try removing the sys.modules.copy and you'll see that still there's more than 50% increase in objects even with all references to module removed. – Slava V Jun 23 '10 at 23:19
  • See here for short of what's wrong (using your code): http://gist.github.com/450726 . I don't try to load-unload `sys`, since we're operating on `sys.modules`, so I use `httplib` - you can try any other. – Slava V Jun 23 '10 at 23:29
  • To clarify: `httplib`, which I try to unload - is not in `sys.modules` yet, when I create a copy. – Slava V Jun 23 '10 at 23:59
  • +1 for "You'll probably just give yourself headaches." – jpmc26 Nov 18 '14 at 01:20