20

Using the deprecated module imp, I can write a custom import hook that modifies the source code of a module on the fly, prior to importation/execution by Python. Given the source code as a string named source below, the essential code needed to create a module is the following:

module = imp.new_module(name)
sys.modules[name] = module
exec(source, module.__dict__)

Since imp is deprecated, I would like to do something similar with importlib. [EDIT: there are other imp methods that need to be replaced to build a custom import hook - so the answer I am looking for is not simply to replace the above code.]

However, I have not been able to figure out how to do this. The importlib documentation has a function to create modules from "specs" which, as far as I can tell, are objects that include their own loaders with no obvious way to redefine them so as to be able to create a module from a string.

I have created a minimal example to demonstrates this; see the readme file for details.

André
  • 914
  • 1
  • 10
  • 23
  • If you take a look at the `imp.new_module` documentation, you'll find `Deprecated since version 3.4: Use types.ModuleType instead.` Does that not solve your problem? – Aran-Fey Apr 23 '17 at 13:28
  • I saw that imp.modules had to be replaced this way but the documentation indicates to use module_from_spec (from importlib). I am using 3 methods from imp to do a custom hook importer and need to find the equivalent for importlib. – André Apr 23 '17 at 13:32

2 Answers2

31

find_module and load_module are both deprecated. You'll need to switch to find_spec and (create_module and exec_module) module respectively. See the importlib documentation for details.

You will also need to examine if you want to use a MetaPathFinder or a PathEntryFinder as the system to invoke them is different. That is, the meta path finder goes first and can override builtin modules, whereas the path entry finder works specifically for modules found on sys.path.

The following is a very basic importer that attempts to replace the entire import machinery for. It shows how to use the functions (find_spec, create_module, and exec_module).

import sys
import os.path

from importlib.abc import Loader, MetaPathFinder
from importlib.util import spec_from_file_location

class MyMetaFinder(MetaPathFinder):
    def find_spec(self, fullname, path, target=None):
        if path is None or path == "":
            path = [os.getcwd()] # top level import -- 
        if "." in fullname:
            *parents, name = fullname.split(".")
        else:
            name = fullname
        for entry in path:
            if os.path.isdir(os.path.join(entry, name)):
                # this module has child modules
                filename = os.path.join(entry, name, "__init__.py")
                submodule_locations = [os.path.join(entry, name)]
            else:
                filename = os.path.join(entry, name + ".py")
                submodule_locations = None
            if not os.path.exists(filename):
                continue

            return spec_from_file_location(fullname, filename, loader=MyLoader(filename),
                submodule_search_locations=submodule_locations)

        return None # we don't know how to import this

class MyLoader(Loader):
    def __init__(self, filename):
        self.filename = filename

    def create_module(self, spec):
        return None # use default module creation semantics

    def exec_module(self, module):
        with open(self.filename) as f:
            data = f.read()

        # manipulate data some way...

        exec(data, vars(module))

def install():
    """Inserts the finder into the import machinery"""
    sys.meta_path.insert(0, MyMetaFinder())

Next is a slightly more delicate version that attempts to reuse more of the import machinery. As such, you only need to define how to get the source of the module.

import sys
from os.path import isdir
from importlib import invalidate_caches
from importlib.abc import SourceLoader
from importlib.machinery import FileFinder


class MyLoader(SourceLoader):
    def __init__(self, fullname, path):
        self.fullname = fullname
        self.path = path

    def get_filename(self, fullname):
        return self.path

    def get_data(self, filename):
        """exec_module is already defined for us, we just have to provide a way
        of getting the source code of the module"""
        with open(filename) as f:
            data = f.read()
        # do something with data ...
        # eg. ignore it... return "print('hello world')"
        return data


loader_details = MyLoader, [".py"]

def install():
    # insert the path hook ahead of other path hooks
    sys.path_hooks.insert(0, FileFinder.path_hook(loader_details))
    # clear any loaders that might already be in use by the FileFinder
    sys.path_importer_cache.clear()
    invalidate_caches()
Dunes
  • 37,291
  • 7
  • 81
  • 97
  • Excellent, thanks! Could you extend your classes for handling namespace packages too? – Géry Ogam Oct 25 '18 at 07:07
  • 1
    Handle them how? The second example already works with namespaces. – Dunes Oct 25 '18 at 09:51
  • Yes but not the first one. That's the one that I am interested in. – Géry Ogam Oct 25 '18 at 15:12
  • 1
    I don't know what you're trying to achieve. Namespace packages to not have source code to modify (they lack `__init__.py` files). So there would be nothing to modify. I think you need to ask a separate question and clearly set out your desired behaviour and what you have already tried. – Dunes Oct 25 '18 at 18:58
  • Sorry I should have been more specific. In the first example, you cannot modify a submodule of a namespace package, you need a regular package (that is an `__init__.py` file). For instance `import namespace.submodule` won't use the `MyLoader` class since the `MyMetaFinder` class will return `None` as the `if not os.path.exists(filename):` test will always succeed. – Géry Ogam Oct 26 '18 at 07:11
  • 1
    Have you tried experimenting and seeing what happens when you use these loaders with namespace packages and their submodules? I just tried both and they both work just fine. The namespace package is loaded by a namespace loader. The namespace package then provides paths to search for child modules which are passed to the the custom loaders. They are able to find child modules on said paths and load the child modules (with whatever customisation you want). – Dunes Oct 26 '18 at 12:57
  • You are right, the first example works fine for namespace packages, even if it does not use the `MyLoader` class. And in the context of the question it makes sense not to use it, since the `MyLoader` class is used only to modify the source code of a file, and namespace packages have no `__init__.py` file. Actually I was trying to use your `MyMetaFinder` and `MyLoader` classes in the context of a virtual file system implemented as a Python dictionary. In this context, I needed to always use the `MyLoader` class. But now I have found the modifications to do to your classes to make that work. – Géry Ogam Oct 27 '18 at 13:20
5

See also this nice project https://pypi.org/project/importhook/

pip install importhook
import importhook

# Setup hook to be called any time the `socket` module is imported and loaded into module cache
@importhook.on_import('socket')
def on_socket_import(socket):
    new_socket = importhook.copy_module(socket)
    setattr(new_socket, 'gethostname', lambda: 'patched-hostname')
    return new_socket

# Import module
import socket

# Prints: 'patched-hostname'
print(socket.gethostname())
  • 2
    I created my own: https://pypi.org/project/ideas/ which is more versatile. Still, thanks for the suggestion. – André May 26 '21 at 16:17