10

I'd like to monkey-patch Python lists, in particular, replacing the __setitem__ method with custom code. Note that I am not trying to extend, but to overwrite the builtin types. For example:

>>> # Monkey Patch  
... # Replace list.__setitem__ with a Noop
...
>>> myList = [1,2,3,4,5]
>>> myList[0] = "Nope"
>>> myList
[1, 2, 3, 4, 5]

Yes, I know that is a downright perverted thing to do to python code. No, my usecase doesn't really make sense. Nonetheless, can it be done?

Possible avenues:

  • Setting a read only attribute on builtins using ctypes
  • The forbiddenfruit module allows patching of C builtins, but does not work when trying to override the list methods
  • This Gist also manages monkey patching of builtin by manipulating the object's dictionary. I've updated it to Python3 here but it still doesn't allow overriding of the methods.
  • The Pyrthon library overrides the list type in a module to make it immutable by using AST transformation. This could be worth investigating.

Demonstrative example

I actually manage to override the methods themselves, as shown below:

import ctypes

def magic_get_dict(o):
    # find address of dict whose offset is stored in the type
    dict_addr = id(o) + type(o).__dictoffset__
    # retrieve the dict object itself
    dict_ptr = ctypes.cast(dict_addr, ctypes.POINTER(ctypes.py_object))
    return dict_ptr.contents.value

def magic_flush_mro_cache():
    ctypes.PyDLL(None).PyType_Modified(ctypes.cast(id(object), ctypes.py_object))

print(list.__setitem__)
dct = magic_get_dict(list)
dct['__setitem__'] = lambda s, k, v: s
magic_flush_mro_cache()
print(list.__setitem__)

x = [1,2,3,4,5]
print(x.__setitem__)
x.__setitem__(0,10)
x[1] = 20
print(x)

Which outputs the following:

➤ python3 override.py
<slot wrapper '__setitem__' of 'list' objects>
<function <lambda> at 0x10de43f28>
<bound method <lambda> of [1, 2, 3, 4, 5]>
[1, 20, 3, 4, 5]

But as shown in the output, this doesn't seem to affect the normal syntax for setting an item (x[0] = 0)

Alternative: Monkey patching an individual list instance

As a lesser alternative, if I was able to monkey patch an individual list's instance, this could work too. Perhaps by changing the class pointer of the list to a custom class.

Community
  • 1
  • 1
brice
  • 24,329
  • 7
  • 79
  • 95
  • 4
    I'd strongly suspect that doing this, if you found a way, would break the interpreter when it tries to update its own internal list objects. – Blckknght Jul 08 '16 at 01:23
  • I know my own experiments with monkey-patching core types have ended in messy crashes. – user2357112 Jul 08 '16 at 01:39
  • 2
    "`type(list)` returns ``" - what Python 2 release are you on, 2.0? `type(list) is type` on any remotely modern Python 2 release. – user2357112 Jul 08 '16 at 01:40
  • 3
    Possible duplicate of [Extension method for python built-in types!](http://stackoverflow.com/questions/6738987/extension-method-for-python-built-in-types) – zondo Jul 08 '16 at 01:46
  • @user2357112 Yup, quite right... Must have messed up the shell I tried it in. – brice Jul 08 '16 at 01:46
  • @zondo, Well, I'm not trying to extend, but to _overwrite_ the types – brice Jul 08 '16 at 01:48
  • 3
    If you want to keep digging with this, I'd try temporarily clearing the `Py_TPFLAGS_HEAPTYPE` flag on the `list` type and then just assigning to `list.__setitem__` the regular way. That'd call `update_slot`, which your current attempts aren't doing. Whether that'd be enough and what else it would break, I don't know. You might have to screw with `sq_ass_item`, `sq_ass_slice`, and `mp_ass_subscript` manually. – user2357112 Jul 08 '16 at 02:21
  • 4
    Even if you got all of this right, the C level code that directly calls the `PyList_*` or `PySequence_Fast*` APIs bypasses lookup of `__setitem__`/`mp_ass_subscript`/`sq_ass_item` entirely, so your code wouldn't be invoked. – ShadowRanger Aug 03 '16 at 10:19

2 Answers2

4

A little late to the party, but nonetheless, here's the answer.

As user2357112 hinted in the comment above, modifying the dict won't suffice, since __getitme__ (and other double-underscore names) are mapped to their slot, and won't be updated without calling update_slot (which isn't exported, so that would be a little tricky).

Inspired by the above comment, here's a working example of making __setitem__ a no-op for specific lists:

# assuming v3.8 (tested on Windows x64 and Ubuntu x64)
# definition of PyTypeObject: https://github.com/python/cpython/blob/3.8/Include/cpython/object.h#L177
# no extensive testing was performed and I'll let other decide if this is a good idea or not, but it's possible

import ctypes

Py_TPFLAGS_HEAPTYPE = (1 << 9)

# calculate the offset of the tp_flags field
offset  = ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_base.ob_refcnt
offset += ctypes.sizeof(ctypes.c_void_p)  * 1 # PyObject_VAR_HEAD.ob_base.ob_type
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_size
offset += ctypes.sizeof(ctypes.c_void_p)  * 1 # tp_name
offset += ctypes.sizeof(ctypes.c_ssize_t) * 2 # tp_basicsize+tp_itemsize
offset += ctypes.sizeof(ctypes.c_void_p)  * 1 # tp_dealloc
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # tp_vectorcall_offset
offset += ctypes.sizeof(ctypes.c_void_p)  * 7 # tp_getattr+tp_setattr+tp_as_async+tp_repr+tp_as_number+tp_as_sequence+tp_as_mapping
offset += ctypes.sizeof(ctypes.c_void_p)  * 6 # tp_hash+tp_call+tp_str+tp_getattro+tp_setattro+tp_as_buffer

tp_flags = ctypes.c_ulong.from_address(id(list) + offset)
assert(tp_flags.value == list.__flags__) # should be the same

lst1 = [1,2,3]
lst2 = [1,2,3]
dont_set_me = [lst1] # these lists cannot be set

# define new method
orig = list.__setitem__
def new_setitem(self, *args):
    if [_ for _ in dont_set_me if _ is self]: # check for identical object in list
        print('Nope')
    else:
        return orig(self, *args)

tp_flags.value |= Py_TPFLAGS_HEAPTYPE # add flag, to allow type_setattro to continue
list.__setitem__ = new_setitem # set method, this will already call PyType_Modified and update_slot
tp_flags.value &= (~Py_TPFLAGS_HEAPTYPE) # remove flag

print(lst1, lst2)       # > [1, 2, 3] [1, 2, 3]
lst1[0],lst2[0]='x','x' # > Nope
print(lst1, lst2)       # > [1, 2, 3] ['x', 2, 3]

Edit
See here why it's not supported to begin with. Mainly, as explained by Guido van Rossum:

This is prohibited intentionally to prevent accidental fatal changes to built-in types (fatal to parts of the code that you never though of). Also, it is done to prevent the changes to affect different interpreters residing in the address space, since built-in types (unlike user-defined classes) are shared between all such interpreters.

I also searched for all usages of Py_TPFLAGS_HEAPTYPE in cpython and they all seem to be related to GC or some validations.

So I guess if:

  • You don't change the types structure (I believe the above doesnt)
  • You're not using multiple interpreters in the same process
  • You remove the flag and immediately restore it in a single-threaded state
  • You don't really do anything that can affect GC when the flag is removed

You'll just be fine <generic disclaimer here>.

Eli Finkel
  • 463
  • 2
  • 13
  • 1
    Works for dict too (python 3.8.10)! I just don't know how long it will endure across new versions. – dawid Jul 13 '21 at 14:15
0

Can't be done. If you do force that using CTypes, you will just crash the Python runtime faster than anything else - as many things itnernally just make use of Python data types.

jsbueno
  • 99,910
  • 10
  • 151
  • 209