5

First of all I want to mention that I know this is a horrible idea and it shouldn't be done. My intention is mainly curiosity and learning the innards of Python, and how to 'hack' them.

I was wondering whether it is at all possible to change what happens when we, for instance, use [] to create a list. Is there a way to modify how the parser behaves in order to, for instance, cause ["hello world"] to call print("hello world") instead of creating a list with one element?

I've attempted to find any documentation or posts about this but failed to do so.

Below is an example of replacing the built-in dict to instead use a custom class:

from __future__ import annotations
from typing import List, Any
import builtins


class Dict(dict):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__dict__ = self

    def subset(self, keys: List[Any]) -> Dict:
        return Dict({key: self[key] for key in keys})


builtins.dict = Dict

When this module is imported, it replaces the dict built-in with the Dict class. However this only works when we directly call dict(). If we attempt to use {} it will fall back to the base dict built-in implementation:

import new_dict

a = dict({'a': 5, 'b': 8})
b = {'a': 5, 'b': 8}

print(type(a))
print(type(b))

Yields:

<class 'py_extensions.new_dict.Dict'>
<class 'dict'>
Jack Avante
  • 1,405
  • 1
  • 15
  • 32
  • Check out documentation on the ast module –  Mar 28 '22 at 03:54
  • 2
    You can't override built-ins from within Python. You will have to modify the actual interpreter implementation (such as CPython). – Selcuk Mar 28 '22 at 04:02
  • @Selcuk You actually can override built-ins. I've managed to completely replace the built-in dict class, so `dict()` actually returns my class. I can't seem to, however, force `{}` to also use my class – Jack Avante Mar 28 '22 at 04:03
  • @JackAvante That sounds interesting. How'd you do it? – Scene Mar 28 '22 at 04:05
  • 2
    @JackAvante True, you can shadow built-in names. I should have been more specific. – Selcuk Mar 28 '22 at 04:06
  • @SeanXie `def mydict: ...` and then `dict = mydict`. – Selcuk Mar 28 '22 at 04:06
  • @Selcuk for some reason, I was expecting some ingenious code manipulation haha... – Scene Mar 28 '22 at 04:10
  • 1
    @SeanXie Edited my code to show what I'm doing as well – Jack Avante Mar 28 '22 at 04:10
  • Related: [Is it possible to overload Python assignment?](https://stackoverflow.com/questions/11024646/is-it-possible-to-overload-python-assignment) – Mateen Ulhaq Mar 28 '22 at 04:28
  • 1
    One possibility is to do this via transpiling. Then, when you "import" a `.py_better` file, it transpiles it into python and imports that. – Mateen Ulhaq Mar 28 '22 at 04:31
  • 1
    Very related: https://stackoverflow.com/q/19083160/476 – deceze Mar 31 '22 at 12:33
  • @deceze It's fantastic that you found that! I completely missed that one when looking for potential answers. It nearly answers my question fully, but I'm not sure whether I should close the question and mark it as duplicate yet considering the fact I'm still interested in the actual implementation of how calling `{ }` works in python cause I developed a way where you could extend `dict` but still assure that other libraries can use it as normal. – Jack Avante Mar 31 '22 at 14:01
  • Additionally, it would be great for someone to maybe even post a direct solution using MacroPy (maybe I'll manage to myself now that I know about it :)) – Jack Avante Mar 31 '22 at 14:12
  • "I've attempted to find any documentation or posts about this but failed to do so." When you look at the [main page for Python library documentation](https://docs.python.org/3/library/index.html), do you see the sections titled `Custom Python Interpreters` and `Python Language Services`? Do the descriptions of any of those modules seem relevant? – Karl Knechtel Apr 02 '22 at 06:28
  • @KarlKnechtel I've checked them out but those seem to be 'in python' interpreters rather than a way to modify the base interpreter – Jack Avante Apr 05 '22 at 09:58
  • I don't understand the distinction you are drawing. It seems as though you want to do some kind of magic that allows you to run `python.exe my_hax_script.py` and have the contents interpreted by different rules than the normal Python syntax. But *how do you want to specify that the magic should occur*? The obvious and most natural way is that you *don't* use `python.exe` to run the script, but instead build your own interpreter, leveraging Python's and making modifications. That is what those modules are for. – Karl Knechtel Apr 06 '22 at 08:45
  • @KarlKnechtel There seems to be a wrapper for MacroPy mentioned in earlier comment that allows for direct execution of macros in the main file on import without running `macropy myscript.py` -- so I do believe it's possible, MacroPy is just quite complicated to use, but it should be able to replace the call in the AST tree for the builtin dict, which would likely be equivalent to replacing the opcode but on a higher level, like mentioned in the current best answer – Jack Avante Apr 06 '22 at 09:43
  • 1
    https://stackoverflow.com/questions/214881/can-you-add-new-statements-to-pythons-syntax has a good answer on modifying the Python intepreter code to add new statements (which would point you in the right direction to modify an existing one); and other answers also show how to mess with any syntax using the `# coding: ...` thing to rewrite code to achieve the same result (i.e you could rewrite `b = {...}` to be transparently `b = new_dict.Dict({...})` or similar) – dbr Apr 06 '22 at 13:05
  • @dbr Awesome that you found that! That provides a lot of context and cool methods to achieve this, even though it's back with Python 2 I don't believe this changed much. I would accept this as an answer if you also include an illustrative example. – Jack Avante Apr 06 '22 at 15:32

2 Answers2

3

[] and {} are compiled to specific opcodes that specifically return a list or a dict, respectively. On the other hand list() and dict() compile to bytecodes that search global variables for list and dict and then call them as functions:

import dis

dis.dis(lambda:[])
dis.dis(lambda:{})
dis.dis(lambda:list())
dis.dis(lambda:dict())

returns (with some additional newlines for clarity):

  3           0 BUILD_LIST               0
              2 RETURN_VALUE

  5           0 BUILD_MAP                0
              2 RETURN_VALUE

  7           0 LOAD_GLOBAL              0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

  9           0 LOAD_GLOBAL              0 (dict)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

Thus you can overwrite what dict() returns simply by overwriting the global dict, but you can't overwrite what {} returns.

These opcodes are documented here. If the BUILD_MAP opcode runs, you get a dict, no way around it. As an example, here is the implementation of BUILD_MAP in CPython, which calls the function _PyDict_FromItems. It doesn't look at any kind of user-defined classes, it specifically makes a C struct that represents a python dict.

It is possible in at least some cases to manipulate the python bytecode at runtime. If you really wanted to make {} return a custom class, I suppose you could write some code to search for the BUILD_MAP opcode and replace it with the appropriate opcodes. Though those opcodes aren't the same size, so there's probably quite a few additional changes you'd have to make.

Chris
  • 1,613
  • 17
  • 24
1

The ast module is an interface to Python's Abstract Syntax Tree which is built after parsing Python code.
It's possible to replace literal dict ({}) with dict call by modifying Abstract Syntax Tree of Python code.

import ast
import new_dict

a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}

print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))

src = """

a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}

print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))

"""

class RewriteDict(ast.NodeTransformer):
    def visit_Dict(self, node):
        # don't replace `dict({"a": 1})`
        if isinstance(node.parent, ast.Call) and node.parent.func.id == "dict":
            return node
        # replace `{"a": 1} with `dict({"a": 1})
        new_node = ast.Call(
            func=ast.Name(id="dict", ctx=ast.Load()),
            args=[node],
            keywords=[],
            type_comment=None,
        )
        return ast.fix_missing_locations(new_node)


tree = ast.parse(src)

# set parent to every node
for node in ast.walk(tree):
    for child in ast.iter_child_nodes(node):
        child.parent = node

RewriteDict().visit(tree)
exec(compile(tree, "ast", "exec"))

output;

<class 'new_dict.Dict'>
<class 'dict'>
<class 'dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>
Nizam Mohamed
  • 8,751
  • 24
  • 32