31

Summary

I really LOVE f-strings. They're bloody awesome syntax.

For a while now I've had an idea for a little library- described below*- to harness them further. A quick example of what I would like it do:

>>> import simpleformatter as sf
>>> def format_camel_case(string):
...     """camel cases a sentence"""
...     return ''.join(s.capitalize() for s in string.split())
...
>>> @sf.formattable(camcase=format_camel_case)
... class MyStr(str): ...
...
>>> f'{MyStr("lime cordial delicious"):camcase}'
'LimeCordialDelicious'

It would be immensely useful-- for the purposes of a simplified API, and extending usage to built-in class instances-- to find a way to hook into the builtin python formatting machinery, which would allow the custom format specification of built-ins:

>>> f'{"lime cordial delicious":camcase}'
'LimeCordialDelicious'

In other words, I'd like to override the built in format function (which is used by the f-string syntax)-- or alternatively, extend the built-in __format__ methods of existing standard library classes-- such that I could write stuff like this:

for x, y, z in complicated_generator:
    eat_string(f"x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}")

I have accomplished this by creating subclasses with their own __format__ methods, but of course this will not work for built-in classes.

I could get close to it using the string.Formatter api:

my_formatter=MyFormatter()  # custom string.Formatter instance

format_str = "x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}"

for x, y, z in complicated_generator:
    eat_string(my_formatter.format(format_str, **locals()))

I find this to be a tad clunky, and definitely not readable compared to the f-string api.

Another thing that could be done is overriding builtins.format:

>>> import builtins
>>> builtins.format = lambda *args, **kwargs: 'womp womp'
>>> format(1,"foo")
'womp womp'

...but this doesn't work for f-strings:

>>> f"{1:foo}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier

Details

Currently my API looks something like this (somewhat simplified):

import simpleformatter as sf
@sf.formatter("this_specification")
def this_formatting_function(some_obj):
    return "this formatted someobj!"

@sf.formatter("that_specification")
def that_formatting_function(some_obj):
    return "that formatted someobj!"

@sf.formattable
class SomeClass: ...

After which you can write code like this:

some_obj = SomeClass()
f"{some_obj:this_specification}"
f"{some_obj:that_specification}"

I would like the api to be more like the below:

@sf.formatter("this_specification")
def this_formatting_function(some_obj):
    return "this formatted someobj!"

@sf.formatter("that_specification")
def that_formatting_function(some_obj):
    return "that formatted someobj!"

class SomeClass: ...  # no class decorator needed

...and allow use of custom format specs on built-in classes:

x=1  # built-in type instance
f"{x:this_specification}"
f"{x:that_specification}"

But in order to do these things, we have to burrow our way into the built-in format() function. How can I hook into that juicy f-string goodness?

* NOTE: I'll probably never actually get around to implementing this library! But I do think it's a neat idea and invite anyone who wants to, to steal it from me :).

Rick
  • 43,029
  • 15
  • 76
  • 119
  • See https://stackoverflow.com/questions/47081521/can-you-overload-the-python-3-6-f-strings-operator – darksky Apr 27 '19 at 03:39
  • 5
    Could you write `f"{spec(x)}"`? – Davis Herring Apr 27 '19 at 03:52
  • @darksky it's funny: I commented on that question over a year ago so it seems I've been thinking about this for a while! The problem I'm trying to solve here is a little different though. – Rick Apr 27 '19 at 13:55
  • @DavisHerring that's certainly another way of doing it, but then you have to import the functions you want to use every time. – Rick Apr 27 '19 at 13:58
  • 1
    @RickTeachey: They have to be imported somewhere if they’re going to be registered with the formatting system, so that doesn’t bother me. It also has the advantage of not risking conflicts in another global namespace (beyond that of top-level modules/packages). – Davis Herring Apr 27 '19 at 14:38
  • @DavisHerring yeah that's an advantage. but i still think it would be useful to be able to extend the specification mini language- or tack on your own mini language- and access it using format specs rather than importing functions. but it does open the door for conflicts. – Rick Apr 27 '19 at 15:19
  • 3
    Check out my progress on this: https://stackoverflow.com/questions/61187996/how-can-i-parse-pythons-triple-quote-f-strings?noredirect=1#comment108245877_61187996 – HappyFace Apr 13 '20 at 13:16
  • 3
    What about `f("{x:spec}")`, using `string.Formatter` along with `inspect.currentframe().f_back` `.f_locals` and `.f_globals` (see https://docs.python.org/3/library/inspect.html and https://stackoverflow.com/questions/6618795/get-locals-from-calling-namespace-in-python)? You could also use an operator such as [`@`](https://www.python.org/dev/peps/pep-0465/#so-is-good-for-matrix-formulas-but-how-common-are-those-really) to improve the syntax (`f@"{x:spec}"`). – Solomon Ucko Oct 28 '21 at 22:21
  • 1
    @SolomonUcko the `f@` idea is pretty creative! downside: the IDE wouldn't know the string contains expressions without the leading f. but that's hardly a dealbreaker. – Rick Oct 29 '21 at 20:00
  • 1
    @RicksupportsMonica do you think there's any chance of a new PEP for configurable f-string converters (e.g. `f'{x!spec}'` where spec is user-defined or defined in an imported module) or global f-string converters (e.g. `f'{x:spec}'` where `spec` is not handled by `x.__format__`)? – Will Da Silva Oct 29 '21 at 20:21
  • @WillDaSilva Never say never but I have spent some time on python-ideas and people are generally opposed to anything that amounts to "spooky action at a distance"... you'd have to find a way to do it that doesn't pollute all string formatting throughout the entire python instance, in all modules... Which might not really be possible? Unsure. – Rick Oct 29 '21 at 21:09
  • 1
    @RicksupportsMonica Thanks for the insight. Personally I've been interested in something like this for a long while to provide better language interop. I have a module that provides conversions to/from Python and another language, and a function that can accept strings of code from said language to be evaluated. For example, if the other language was called "v", it'd be great if I could write something like `v_eval(f'{x!v}')`, where `x` is a Python variable that gets interpolated into the v-lang code, i.e. by converting it and providing some reference understood by v. – Will Da Silva Oct 29 '21 at 21:46
  • @WillDaSilva I originally wanted this for a metaprogramming task I was working on (since abandoned). Basically was taking a human readable specification (in toml) and dynamically generating a bunch of classes, and wanted to be able to associate formatting codes attached to those classes with built-in types. This was years ago. It was a disaster. I didn't know what I was doing. – Rick Oct 29 '21 at 22:35

1 Answers1

29

Overview

You can, but only if you write evil code that probably should never end up in production software. So let's get started!

I'm not going to integrate it into your library, but I will show you how to hook into the behavior of f-strings. This is roughly how it'll work:

  1. Write a function that manipulates the bytecode instructions of code objects to replace FORMAT_VALUE instructions with calls to a hook function;
  2. Customize the import mechanism to make sure that the bytecode of every module and package (except standard library modules and site-packages) is modified with that function.

You can get the full source at https://github.com/mivdnber/formathack, but everything is explained below.

Disclaimer

This solution isn't great, because

  1. There's no guarantee at all that this won't break totally unrelated code;
  2. There's no guarantee that the bytecode manipulations described here will continue working in newer Python versions. It definitely won't work in alternative Python implementations that don't compile to CPython compatible bytecode. PyPy could work in theory, but the solution described here doesn't because the bytecode package isn't 100% compatible.

However, it is a solution, and bytecode manipulation has been used succesfully in popular packages like PonyORM. Just keep in mind that it's hacky, complicated and probably maintenance heavy.

Part 1: Bytecode manipulation

Python code is not executed directly, but is first compiled to a simpler intermediairy, non-human readable stack based language called Python bytecode (it's what's inside *.pyc files). To get an idea of what that bytecode looks like, you can use the standard library dis module to inspect the bytecode of a simple function:

def invalid_format(x):
    return f"{x:foo}"

Calling this function will cause an exception, but we'll "fix" that soon.

>>> invalid_format("bar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in invalid_format
ValueError: Invalid format specifier

To inspect the bytecode, fire up a Python console and call dis.dis:

>>> import dis
>>> dis.dis(invalid_format)
  2           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 ('foo')
              4 FORMAT_VALUE             4 (with format)
              6 RETURN_VALUE

I've annotated the output below to explain what's happening:

# line 2      # Put the value of function parameter x on the stack
  2           0 LOAD_FAST                0 (x)
              # Put the format spec on the stack as a string
              2 LOAD_CONST               1 ('foo')
              # Pop both values from the stack and perform the actual formatting
              # This puts the formatted string on the stack
              4 FORMAT_VALUE             4 (with format)
              # pop the result from the stack and return it
              6 RETURN_VALUE

The idea here is to replace the FORMAT_VALUE instruction with a call to a hook function that allows us to implement whatever behavior we want. Let's implement it like this for now:

def formathack_hook__(value, format_spec=None):
    """
    Gets called whenever a value is formatted. Right now it's a silly implementation,
    but it can be expanded with all sorts of nasty hacks.
    """
    return f"{value} formatted with {format_spec}"

To replace the instruction, I used the bytecode package, which provides surprisingly nice abstractions for doing horrible things.

from bytecode import Bytecode
def formathack_rewrite_bytecode__(code):
    """
    Modifies a code object to override the behavior of the FORMAT_VALUE
    instructions used by f-strings.
    """
    decompiled = Bytecode.from_code(code)
    modified_instructions = []
    for instruction in decompiled:
        name = getattr(instruction, 'name', None)
        if name == 'FORMAT_VALUE':
            # 0x04 means that a format spec is present
            if instruction.arg & 0x04 == 0x04:
                callback_arg_count = 2
            else:
                callback_arg_count = 1
            modified_instructions.extend([
                # Load in the callback
                Instr("LOAD_GLOBAL", "formathack_hook__"),
                # Shuffle around the top of the stack to put the arguments on top
                # of the function global
                Instr("ROT_THREE" if callback_arg_count == 2 else "ROT_TWO"),
                # Call the callback function instead of executing FORMAT_VALUE
                Instr("CALL_FUNCTION", callback_arg_count)
            ])
        # Kind of nasty: we want to recursively alter the code of functions.
        elif name == 'LOAD_CONST' and isinstance(instruction.arg, types.CodeType):
            modified_instructions.extend([
                Instr("LOAD_CONST", formathack_rewrite_bytecode__(instruction.arg), lineno=instruction.lineno)
            ])
        else:
            modified_instructions.append(instruction)
    modified_bytecode = Bytecode(modified_instructions)
    # For functions, copy over argument definitions
    modified_bytecode.argnames = decompiled.argnames
    modified_bytecode.argcount = decompiled.argcount
    modified_bytecode.name = decompiled.name
    return modified_bytecode.to_code()

We can now make the invalid_format function we defined earlier work:

>>> invalid_format.__code__ = formathack_rewrite_bytecode__(invalid_format.__code__)
>>> invalid_format("bar")
'bar formatted with foo'

Success! Manually cursing code objects with tainted bytecode in itself won't damn our souls to an eternity of suffering though; for that, we should manipulate all code automatically.

Part 2: Hooking into the import process

To make the new f-string behavior work everywhere, and not just in manually patched functions, we can customize the Python module import process with a custom module finder and loader using the functionality provided by the standard library importlib module:

class _FormatHackLoader(importlib.machinery.SourceFileLoader):
    """
    A module loader that modifies the code of the modules it imports to override
    the behavior of f-strings. Nasty stuff.
    """
    @classmethod
    def find_spec(cls, name, path, target=None):
        # Start out with a spec from a default finder
        spec = importlib.machinery.PathFinder.find_spec(
            fullname=name,
             # Only apply to modules and packages in the current directory
             # This prevents standard library modules or site-packages
             # from being patched.
            path=[""],
            target=target
        )
        if spec is None:
            return None
        
        # Modify the loader in the spec to this loader
        spec.loader = cls(name, spec.origin)
        return spec

    def get_code(self, fullname):
        # This is called by exec_module to get the code of the module
        # to execute it.
        code = super().get_code(fullname)
        # Rewrite the code to modify the f-string formatting opcodes
        rewritten_code = formathack_rewrite_bytecode__(code)
        return rewritten_code

    def exec_module(self, module):
        # We introduce the callback that hooks into the f-string formatting
        # process in every imported module
        module.__dict__["formathack_hook__"] = formathack_hook__
        return super().exec_module(module)

To make sure the Python interpreter uses this loader to import all files, we have to add it to sys.meta_path:

def install():
    # If the _FormatHackLoader is not registered as a finder,
    # do it now!
    if sys.meta_path[0] is not _FormatHackLoader:
        sys.meta_path.insert(0, _FormatHackLoader)
        # Tricky part: we want to be able to use our custom f-string behavior
        # in the main module where install was called. That module was loaded
        # with a standard loader though, so that's impossible without additional
        # dirty hacks.
        # Here, we execute the module _again_, this time with _FormatHackLoader
        module_globals = inspect.currentframe().f_back.f_globals
        module_name = module_globals["__name__"]
        module_file = module_globals["__file__"]
        loader = _FormatHackLoader(module_name, module_file)
        loader.load_module(module_name)
        # This is actually pretty important. If we don't exit here, the main module
        # will continue from the formathack.install method, causing it to run twice!
        sys.exit(0)

If we put it all together in a formathack module (see https://github.com/mivdnber/formathack for an integrated, working example), we can now use it like this:

# In your main Python module, install formathack ASAP
import formathack
formathack.install()

# From now on, f-string behavior will be overridden!

print(f"{foo:bar}")
# -> "foo formatted with bar"

So that's that! You can expand on this to make the hook function more intelligent and useful (e.g. by registering functions that handle certain format specifiers).

Michilus
  • 1,410
  • 11
  • 9
  • 2
    "They definitely won't work in alternative Python implementations like PyPy." Can you give this a shot? PyPy seems to have the same bytecode *format*, at least at runtime; their JIT only works *with* the bytecode, it doesn't replace it. So there's a good chance this will work in PyPy. – MisterMiyagi Oct 09 '21 at 07:50
  • 2
    @MisterMiyagi cool, I didn't know that! I just tested it with PyPy 7.3.5 (3.7.10), and it does seem to fail because `dis.stack_effect` is not available there. Still, "definitely won't work" is an overstatement, so I'll edit the answer. – Michilus Oct 09 '21 at 08:49
  • 1
    Coming back to this a bit later, it's just so beautiful and really needs to be a pycon talk. – Rick Mar 24 '23 at 18:06