23

Suppose I have a module file like this:

# my_module.py
print("hello")

Then I have a simple script:

# my_script.py
import my_module

This will print "hello".

Let's say I want to "override" the print() function so it returns "world" instead. How could I do this programmatically (without manually modifying my_module.py)?


What I thought is that I need somehow to modify the source code of my_module before or while importing it. Obvisouly, I cannot do this after importing it so solution using unittest.mock are impossible.

I also thought I could read the file my_module.py, perform modification, then load it. But this is ugly, as it will not work if the module is located somewhere else.

The good solution, I think, is to make use of importlib.

I read the doc and found a very intersecting method: get_source(fullname). I thought I could just override it:

def get_source(fullname):
    source = super().get_source(fullname)
    source = source.replace("hello", "world")
    return source

Unfortunately, I am a bit lost with all these abstract classes and I do not know how to perform this properly.

I tried vainly:

spec = importlib.util.find_spec("my_module")
spec.loader.get_source = mocked_get_source
module = importlib.util.module_from_spec(spec)

Any help would be welcome, please.

martineau
  • 119,623
  • 25
  • 170
  • 301
Delgan
  • 18,571
  • 11
  • 90
  • 141
  • `my_module` does not define `print()` which is a built-in function in Python 3.x. – martineau Jan 25 '17 at 17:47
  • @martineau I do not understand what is your point. I use Python 3 so there is no problem using `print()` without defining it. – Delgan Jan 25 '17 at 20:19
  • You said you wanted to override the `print()` function, and I was just pointing out that it's not defined in the module you're importing. – martineau Jan 25 '17 at 20:33
  • @martineau I see, thank you, indeed I cannot properly "override" the print function, I should rather say that I want to monkey-patch it. – Delgan Jan 25 '17 at 20:45
  • Also note that doing it for `print()` might be different than just a general function because it's a built-in. – martineau Jan 25 '17 at 20:53

6 Answers6

22

Here's a solution based on the content of this great talk. It allows any arbitrary modifications to be made to the source before importing the specified module. It should be reasonably correct as long as the slides did not omit anything important. This will only work on Python 3.5+.

import importlib
import sys

def modify_and_import(module_name, package, modification_func):
    spec = importlib.util.find_spec(module_name, package)
    source = spec.loader.get_source(module_name)
    new_source = modification_func(source)
    module = importlib.util.module_from_spec(spec)
    codeobj = compile(new_source, module.__spec__.origin, 'exec')
    exec(codeobj, module.__dict__)
    sys.modules[module_name] = module
    return module

So, using this you can do

my_module = modify_and_import("my_module", None, lambda src: src.replace("hello", "world"))
Martin Valgur
  • 5,793
  • 1
  • 33
  • 45
  • Thank you for taking the time to help me! Your solution is probably the best way to go. – Delgan Jan 26 '17 at 10:07
  • For Python 3 (I thought Python 2 as well), you need to get ride of the `=None` part of `package=None`. Otherwise, you will get `SyntaxError: non-default argument follows default argument` – Roberto Jun 20 '19 at 16:03
  • 1
    There's also a video of David Beazley's [Modules and Packages](https://www.youtube.com/watch?v=0oTh1CXRaQ0) presentation on youtube. – martineau Sep 30 '19 at 15:03
  • 1
    I have tested this solution using python 3.6. `new_source` contains the code of the modified module; however, the returned module contains the original code. Any idea on how to make it work? – carlorop Oct 31 '21 at 10:48
  • This `ModulePackage.pdf` is very informative and helpful, thanks. BTW, according to the [document](https://docs.python.org/3/library/importlib.html#importlib.util.find_spec) of `importlib.util.find_spec`, `If name is for a submodule (contains a dot), the parent module is **automatically imported**.`, which is probably not what we wanted. If `module_name='a.b'`, then we should modify the second to last line of the function to `sys.modules['a'].b=module\nsys.modules['a.b']=module`. – Brainor Nov 16 '22 at 05:36
5

This doesn't answer the general question of dynamically modifying the source code of an imported module, but to "Override" or "monkey-patch" its use of the print() function can be done (since it's a built-in function in Python 3.x). Here's how:

#!/usr/bin/env python3
# my_script.py

import builtins

_print = builtins.print

def my_print(*args, **kwargs):
    _print('In my_print: ', end='')
    return _print(*args, **kwargs)

builtins.print = my_print

import my_module  # -> In my_print: hello
martineau
  • 119,623
  • 25
  • 170
  • 301
5

I first needed to better understand the import operation. Fortunately, this is well explained in the importlib documentation and scratching through the source code helped too.

This import process is actually split in two parts. First, a finder is in charge of parsing the module name (including dot-separated packages) and instantiating an appropriate loader. Indeed, built-in are not imported as local modules for example. Then, the loader is called based on what the finder returned. This loader get the source from a file or from a cache, and executed the code if the module was not previously loaded.

This is very simple. This explains why I actually did not need to use abstract classes from importutil.abc: I do not want to provide my own import process. Instead, I could create a subclass inherited from one of the classes from importuil.machinery and override get_source() from SourceFileLoader for example. However, this is not the way to go because the loader is instantiated by the finder so I do not have the hand on which class is used. I cannot specify that my subclass should be used.

So, the best solution is to let the finder do its job, and then replace the get_source() method of whatever Loader has been instantiated.

Unfortunately, by looking trough the code source I saw that the basic Loaders are not using get_source() (which is only used by the the inspect module). So my whole idea could not work.

In the end, I guess get_source() should be called manually, then the returned source should be modified, and finally the code should be executed. This is what Martin Valgur detailed in his answer.

If compatibility with Python 2 is needed, I see no other way than reading the source file:

import imp
import sys
import types

module_name = "my_module"

file, pathname, description = imp.find_module(module_name)

with open(pathname) as f:
    source = f.read()

source = source.replace('hello', 'world')

module = types.ModuleType(module_name)
exec(source, module.__dict__)

sys.modules[module_name] = module
Community
  • 1
  • 1
Delgan
  • 18,571
  • 11
  • 90
  • 141
4

If importing the module before the patching it is okay, then a possible solution would be

import inspect

import my_module

source = inspect.getsource(my_module)
new_source = source.replace('"hello"', '"world"')
exec(new_source, my_module.__dict__)

If you're after a more general solution, then you can also take a look at the approach I used in another answer a while ago.

Community
  • 1
  • 1
Martin Valgur
  • 5,793
  • 1
  • 33
  • 45
  • How could this be useful to me? How would you change the print value using your workaround? – Delgan Jan 25 '17 at 20:23
  • Sorry, I assumed you wanted a generic method to monkey patch any part of a module. Reading your question again it seems that you wish to avoid importing the module first, in which case I agree, my solution would not be relevant here. – Martin Valgur Jan 25 '17 at 21:23
  • I rewrote my answer entirely. Is this useful to you? If not, I'll delete it. – Martin Valgur Jan 25 '17 at 21:32
  • Thank you. This is not useful to me but this could help someone else (who would not have the problem of mocking before importing) so please do not delete your answer. ;) – Delgan Jan 25 '17 at 21:43
0

My solution updates the source file, which works for the inner import situation. The inner import means that transformers.models.albert import modeling_albert from the source file. In such case, even if I use the solution from Martin Valgur, it won't work. So I update the source file. Hope it help the people who have the same trouble with me.

import inspect
from transformers.models.albert import modeling_albert

# Get source
source = inspect.getsource(modeling_albert)
source_before = "AlbertModel(config, add_pooling_layer=False)"
source_after = "AlbertModel(config, add_pooling_layer=True)"
new_source = source.replace(source_before, source_after)

# Update file
file_path = modeling_albert.__spec__.origin
with open(file_path, 'w') as f:
    f.write(new_source)
BrambleXu
  • 193
  • 2
  • 15
-1

Not elegant, but works for me (may have to add a path):

with open ('my_module.py') as aFile:
    exec (aFile.read () .replace (<something>, <something else>))
Jacques de Hooge
  • 6,750
  • 2
  • 28
  • 45
  • I precised I would like to avoid having to specify the module path. Moreover, as you said `exec()` is not elegant at all, it should exist a better solution. – Delgan Jan 25 '17 at 20:21