3

Before carrying on any further, I am aware that one should never do this. This question is purely for educational purposes; I undertook this exercise as a means of better understanding python's internals, the ctypes, and how they work.

I am aware that it is relatively easy to change the value of integers in python. Actually, there's a whole lot you can do by messing with the internals. From the C API reference,

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

Considering the value of 1 is cached by CPython, it should be relatively easy (or, at least possible) to do this. After a little digging around, I found ctypes was the way to go. However, most of what I try results in a segfault. I got close by changing the value of 2.

import ctypes
def deref(addr, typ):
     return ctypes.cast(addr, ctypes.POINTER(typ))

deref(id(2), ctypes.c_int)[6] = 1

1 + 1 now gives incorrect results (a step in the right direction), but I cannot get it to evaluate to "3":

>>> 1 + 1
1

>>> 1 + 2
1

>>> 1 + 3
[1]    61014 segmentation fault  python3.6

I have tried similar things ending in failure with abarnert's internals module. Is there any way to have 1 + 1 evaluate to 3 in python? Or is "1" so all important that there is no way of making this work without segfaulting my interpreter?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Seems like you could have done `deref(id(2), ctypes.c_int)[6] = 3`. – user2357112 Dec 27 '18 at 19:58
  • 6
    I'm voting to close this question as off-topic because once you start hacking around to this degree you can probably achieve almost anything, most of it without any purpose or use, and that's not really what we're here for. – Lightness Races in Orbit Dec 27 '18 at 19:58
  • 2
    @LightnessRacesinOrbit Fair enough, the only purpose of this was to understand the internals is all :-) – cs95 Dec 27 '18 at 19:59
  • @user2357112 Yeah! It worked! Along with a cryptic, seemingly unrelated `OSError`. Hmm, I would still like to understand why, unless it's undefined behaviour in which case I wouldn't need any further explanation. – cs95 Dec 27 '18 at 20:00
  • @coldspeed Have fun! – Lightness Races in Orbit Dec 27 '18 at 20:00
  • 2
    It's undefined behavior. Of course it's undefined behavior. How could you expect anything else when you're messing with the value of 2? The specific way the undefined behavior manifests in my tests is that a `write` call that writes 2 characters returns 2, except that the 2 object has value 3, and 3 is more characters than should have been printed. – user2357112 Dec 27 '18 at 20:04
  • @user2357112 Even though it seems like this question might be closed, I'd still like to reward your comments if you would write an answer. – cs95 Dec 27 '18 at 20:04
  • I'm voting to close this question as off-topic because intentionally subverting the language functionality is not a demonstrably useful programming tool. – Prune Dec 27 '18 at 20:07
  • 1
    @Prune I understand. I recall a similar question being asked in java before which was received well, so I assumed this would be fine as long as it was treated as a pedagogic tool for understanding the internals. Happy to go with the community's vote, however – cs95 Dec 27 '18 at 20:09
  • @coldspeed: I'm also satisfied to go with community vote; *I* think this is a borderline question; I won't lose any sleep if we don't get 5 votes. – Prune Dec 27 '18 at 20:12
  • It was possible to do this sort of thing by accident in IBM's Fortran IV G compiler. That was widely acknowledged to be a bug. – BoarGules Dec 27 '18 at 20:31
  • Might be a bit late to the party; this **works just fine if run from a script** instead of the interactive shell. After some digging around in CPython's source, it seems the segfault is due to `deref` corrupting the Python compiler's constant cache dictionary. As the code is compiled *and* executed line-by-line in the shell, the corrupted dictionary is then being used to compile the subsequent statements. ... – meowgoesthedog Dec 27 '18 at 21:56
  • ... This affects affects (`1 + 3` =) 4 instead of 2 because the dictionary gets converted to an array which is missing an element at the end (corresponding to `4`). In contrast, the script is compiled as a whole before it is executed, so the corrupted dictionary is never used. – meowgoesthedog Dec 27 '18 at 21:57
  • You *might* (stress) be able to achieve your goal by over-writing Python's addition functions for `int/long` (see `PyLong_Type`/`long_as_number` in `Objects/longobject.c`), which would require some even dirtier C code. – meowgoesthedog Dec 27 '18 at 22:04
  • Lots of potentially fun answers to this one... – cs95 Dec 28 '18 at 02:12

1 Answers1

3

Disclaimer: this answer refers to CPython only; I might have also missed the point of the question...

I was able to (kinda) achieve this by writing a Python extension in C.

In Objects/intobject.c there is an info struct PyInt_Type. Its tp_as_number field is a table of operator functions, the nb_add field of which is the addition operator:

// the function in the same file that nb_add points to
static PyObject *
int_add(PyIntObject *v, PyIntObject *w)
    ...

PyInt_Type is an exposed global variable, and can be retrieved with dlsym in Unix / GetProcAddress in WinAPI:

#include <dlfcn.h>

...

// symbol look-up from the Python extension
void* addr = dlsym(RTLD_DEFAULT, "PyInt_Type");

// pointer to PyInt_Type
PyTypeObject *int_type = addr;

// pointer to int_as_number (PyInt_Type.tp_as_number)
PyNumberMethods *int_funcs = int_type->tp_as_number;

// pointer to int_add (tp_as_number->nb_add)
int_add_orig = int_funcs->nb_add;

// override this with a custom function
int_funcs->nb_add = (binaryfunc)int_add_new;

...

// custom add function
PyObject *int_add_new(PyIntObject *v, PyIntObject *w)
{
    long a = PyInt_AS_LONG(v);
    long b = PyInt_AS_LONG(w);

    // 1 + 1 = 3 special case
    if (a == 1 && b == 1) {
        return PyInt_FromLong(3);
    }

    // for all other cases default to the
    // original add function which was retrieved earlier
    return int_add_orig((PyObject *)v, (PyObject *)w);
}

By preserving all of the original code and internal variables, the new code avoids the previously experienced segfaults:

>>> # load the extension

>>> import [...]

>>> 1 + 1
2

>>> # call the extension function which overloads the add operator

>>> 1 + 1
3

>>> 1 + 0
1

>>> 1 + 2
3

>>> 1 + 3
4
meowgoesthedog
  • 14,670
  • 4
  • 27
  • 40
  • I like the approach you've taken to answer this question! While this isn't originally what I had in mind when I posted the question, it's cool to know how to do this by actually modifying the source code. – cs95 Dec 28 '18 at 11:36
  • @coldspeed well not quite modifying Python's source code (the goal was to avoid doing so directly – because it make this problem conceptually trivial :D ) but its internal state variables at runtime (no different to your attempt with `deref`). – meowgoesthedog Dec 28 '18 at 11:37
  • Yes, I see, this uses the C API to write an extension. I am still digesting this, excuse my ignorance :) – cs95 Dec 28 '18 at 11:39