2

I was experimenting with setting the dictionary sys.modules while working on an answer to another question and came across something interesting. The linked question deals with removing all the effects of importing a module. Based on another post, I came up with the idea of deleting all new modules from sys.modules after an import. My initial implementation was to do the following (testing with numpy as the module to load and unload):

# Load the module
import sys
mod_copy = sys.modules.copy()
print('numpy' in mod_copy, 'numpy' in sys.modules) # False False
import numpy
print('numpy' in mod_copy, 'numpy' in sys.modules) # False True
print(id(numpy)) # 45138472

The printouts show that numpy was imported successfully and that the shallow copy does not contain it, as expected.

Now my idea was to unload the module by swapping mod_copy back into sys.modules, then delete the local reference to the module. That should in theory remove all references to it (and possibly it does):

sys.modules = mod_copy
del numpy
print('numpy' in sys.modules) # False

This should be enough to be able to re-import the module, but when I do

import numpy
print('numpy' in sys.modules) # False
print(id(numpy)) # 45138472

It appears that the numpy module is not reloaded since it has the same id as before. It does not show up in sys.modules, despite the fact that the import statement raises no errors and appears to complete successfully (i.e., a numpy module exists in the local namespace).

On the other hand, the implementation that I made in my answer to the linked question does appear to work fine. It modifies the dictionary directly instead of swapping it out:

import sys
mod_copy = sys.modules.copy()
print('numpy' in mod_copy, 'numpy' in sys.modules) # False False
import numpy
print('numpy' in mod_copy, 'numpy' in sys.modules) # False True
print(id(numpy)) # 35963432

for m in list(sys.modules):
    if m not in mod_copy:
        del sys.modules[m]
del numpy
print('numpy' in sys.modules) # False

import numpy
print('numpy' in sys.modules) # True
print(id(numpy)) # (54941000 != 35963432)

I am using Python 3.5.2 on an Anaconda install. I am most interested in explanations focusing on Python 3, but I am curious about Python 2.7+ as well.

The only thing I can think of that is happening here is that sys maintains another reference to sys.modules and uses that internal reference regardless of what I do to the public one. I am not sure that this covers everything though, so I would like to know what is really going on.

Community
  • 1
  • 1
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • On Python 2, I would expect the import machinery to use its own internal reference to the module cache, independent of the `sys.modules` reference, but as far as I can tell, on Python 3.5, the [import machinery](https://hg.python.org/cpython/file/3.5/Lib/importlib/_bootstrap.py) accesses the `modules` attribute of the `sys` module object whenever it needs the module cache. I don't know why `'numpy'` would fail to show up in `sys.modules` after replacing the dict and re-`import`ing `numpy`. – user2357112 Feb 09 '17 at 21:14
  • @user2357112. Well, it looks like there is some internal reference because the `id` of `numpy` is identical after both imports. As if it is using the original `modules` dictionary and ignoring the one that was swapped in. – Mad Physicist Feb 09 '17 at 21:19
  • The `id` being the same doesn't mean much; it's quite common for different objects to have the same ID if they have non-overlapping lifetimes. To see whether it's using the same object, you'd want to set something like `numpy.lookatme = 3` and see if that modification still shows up afterwards. – user2357112 Feb 09 '17 at 21:21

1 Answers1

2

Even in Python 3.5, part of the import implementation is still written in C, and that part uses PyThreadState_GET()->interp->modules to retrieve the module cache, rather than going through the sys.modules attribute. Your import is finding numpy in the old sys.modules through one of those code paths.

sys.modules isn't designed to be replaced. The docs mention that replacing it may behave unexpectedly:

This can be manipulated to force reloading of modules and other tricks. However, replacing the dictionary will not necessarily work as expected and deleting essential items from the dictionary may cause Python to fail.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • That makes sense. So basically there is more than one reference to the dictionary floating around. Do you think this merits a mention on the mailing list or as a bug report? – Mad Physicist Feb 10 '17 at 00:36
  • Also, does the behavior change in 3.6? – Mad Physicist Feb 10 '17 at 00:37
  • @MadPhysicist: The code looks pretty much the same on Python 3.6, but I didn't test it. As for a bug report, no. You're not expected to be able to replace the module cache. From the [docs](https://docs.python.org/3/library/sys.html#sys.modules): "This can be manipulated to force reloading of modules and other tricks. However, **replacing the dictionary will not necessarily work as expected** and deleting essential items from the dictionary may cause Python to fail." – user2357112 Feb 10 '17 at 00:40