Overview
You can, but only if you write evil code that probably should never end up in production software. So let's get started!
I'm not going to integrate it into your library, but I will show you how to hook into the behavior of f-strings. This is roughly how it'll work:
- Write a function that manipulates the bytecode instructions of code objects to replace
FORMAT_VALUE
instructions with calls to a hook function;
- Customize the import mechanism to make sure that the bytecode of every module and package (except standard library modules and site-packages) is modified with that function.
You can get the full source at https://github.com/mivdnber/formathack, but everything is explained below.
Disclaimer
This solution isn't great, because
- There's no guarantee at all that this won't break totally unrelated code;
- There's no guarantee that the bytecode manipulations described here will continue working in newer Python versions. It definitely won't work in alternative Python implementations that don't compile to CPython compatible bytecode. PyPy could work in theory, but the solution described here doesn't because the bytecode package isn't 100% compatible.
However, it is a solution, and bytecode manipulation has been used succesfully in popular packages like PonyORM. Just keep in mind that it's hacky, complicated and probably maintenance heavy.
Part 1: Bytecode manipulation
Python code is not executed directly, but is first compiled to a simpler intermediairy, non-human readable stack based language called Python bytecode (it's what's inside *.pyc files). To get an idea of what that bytecode looks like, you can use the standard library dis module to inspect the bytecode of a simple function:
def invalid_format(x):
return f"{x:foo}"
Calling this function will cause an exception, but we'll "fix" that soon.
>>> invalid_format("bar")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in invalid_format
ValueError: Invalid format specifier
To inspect the bytecode, fire up a Python console and call dis.dis
:
>>> import dis
>>> dis.dis(invalid_format)
2 0 LOAD_FAST 0 (x)
2 LOAD_CONST 1 ('foo')
4 FORMAT_VALUE 4 (with format)
6 RETURN_VALUE
I've annotated the output below to explain what's happening:
# line 2 # Put the value of function parameter x on the stack
2 0 LOAD_FAST 0 (x)
# Put the format spec on the stack as a string
2 LOAD_CONST 1 ('foo')
# Pop both values from the stack and perform the actual formatting
# This puts the formatted string on the stack
4 FORMAT_VALUE 4 (with format)
# pop the result from the stack and return it
6 RETURN_VALUE
The idea here is to replace the FORMAT_VALUE
instruction with a call to a hook function that allows us to implement whatever behavior we want. Let's implement it like this for now:
def formathack_hook__(value, format_spec=None):
"""
Gets called whenever a value is formatted. Right now it's a silly implementation,
but it can be expanded with all sorts of nasty hacks.
"""
return f"{value} formatted with {format_spec}"
To replace the instruction, I used the bytecode package, which provides surprisingly nice abstractions for doing horrible things.
from bytecode import Bytecode
def formathack_rewrite_bytecode__(code):
"""
Modifies a code object to override the behavior of the FORMAT_VALUE
instructions used by f-strings.
"""
decompiled = Bytecode.from_code(code)
modified_instructions = []
for instruction in decompiled:
name = getattr(instruction, 'name', None)
if name == 'FORMAT_VALUE':
# 0x04 means that a format spec is present
if instruction.arg & 0x04 == 0x04:
callback_arg_count = 2
else:
callback_arg_count = 1
modified_instructions.extend([
# Load in the callback
Instr("LOAD_GLOBAL", "formathack_hook__"),
# Shuffle around the top of the stack to put the arguments on top
# of the function global
Instr("ROT_THREE" if callback_arg_count == 2 else "ROT_TWO"),
# Call the callback function instead of executing FORMAT_VALUE
Instr("CALL_FUNCTION", callback_arg_count)
])
# Kind of nasty: we want to recursively alter the code of functions.
elif name == 'LOAD_CONST' and isinstance(instruction.arg, types.CodeType):
modified_instructions.extend([
Instr("LOAD_CONST", formathack_rewrite_bytecode__(instruction.arg), lineno=instruction.lineno)
])
else:
modified_instructions.append(instruction)
modified_bytecode = Bytecode(modified_instructions)
# For functions, copy over argument definitions
modified_bytecode.argnames = decompiled.argnames
modified_bytecode.argcount = decompiled.argcount
modified_bytecode.name = decompiled.name
return modified_bytecode.to_code()
We can now make the invalid_format
function we defined earlier work:
>>> invalid_format.__code__ = formathack_rewrite_bytecode__(invalid_format.__code__)
>>> invalid_format("bar")
'bar formatted with foo'
Success! Manually cursing code objects with tainted bytecode in itself won't damn our souls to an eternity of suffering though; for that, we should manipulate all code automatically.
Part 2: Hooking into the import process
To make the new f-string behavior work everywhere, and not just in manually patched functions, we can customize the Python module import process with a custom module finder and loader using the functionality provided by the standard library importlib module:
class _FormatHackLoader(importlib.machinery.SourceFileLoader):
"""
A module loader that modifies the code of the modules it imports to override
the behavior of f-strings. Nasty stuff.
"""
@classmethod
def find_spec(cls, name, path, target=None):
# Start out with a spec from a default finder
spec = importlib.machinery.PathFinder.find_spec(
fullname=name,
# Only apply to modules and packages in the current directory
# This prevents standard library modules or site-packages
# from being patched.
path=[""],
target=target
)
if spec is None:
return None
# Modify the loader in the spec to this loader
spec.loader = cls(name, spec.origin)
return spec
def get_code(self, fullname):
# This is called by exec_module to get the code of the module
# to execute it.
code = super().get_code(fullname)
# Rewrite the code to modify the f-string formatting opcodes
rewritten_code = formathack_rewrite_bytecode__(code)
return rewritten_code
def exec_module(self, module):
# We introduce the callback that hooks into the f-string formatting
# process in every imported module
module.__dict__["formathack_hook__"] = formathack_hook__
return super().exec_module(module)
To make sure the Python interpreter uses this loader to import all files, we have to add it to sys.meta_path
:
def install():
# If the _FormatHackLoader is not registered as a finder,
# do it now!
if sys.meta_path[0] is not _FormatHackLoader:
sys.meta_path.insert(0, _FormatHackLoader)
# Tricky part: we want to be able to use our custom f-string behavior
# in the main module where install was called. That module was loaded
# with a standard loader though, so that's impossible without additional
# dirty hacks.
# Here, we execute the module _again_, this time with _FormatHackLoader
module_globals = inspect.currentframe().f_back.f_globals
module_name = module_globals["__name__"]
module_file = module_globals["__file__"]
loader = _FormatHackLoader(module_name, module_file)
loader.load_module(module_name)
# This is actually pretty important. If we don't exit here, the main module
# will continue from the formathack.install method, causing it to run twice!
sys.exit(0)
If we put it all together in a formathack
module (see https://github.com/mivdnber/formathack for an integrated, working example), we can now use it like this:
# In your main Python module, install formathack ASAP
import formathack
formathack.install()
# From now on, f-string behavior will be overridden!
print(f"{foo:bar}")
# -> "foo formatted with bar"
So that's that! You can expand on this to make the hook function more intelligent and useful (e.g. by registering functions that handle certain format specifiers).