10

I'm trying to overload some methods of the string builtin. I know there is no really legitimate use-case for this, but the behavior still bugs me so I would like to get an explanation of what is happening here:

Using Python2, and the forbiddenfruit module.

>>> from forbiddenfruit import curse
>>> curse(str, '__repr__', lambda self:'bar')
>>> 'foo'
'foo'
>>> 'foo'.__repr__()
'bar'

As you can see, the __repr__ function as been successfully overloaded, but isn't actually called when when we ask for a representation. Why is that?

Then, how would you do to get the expected behaviour:

>>> 'foo'
'bar'

There is no constraint about setting up a custom environment, if rebuilding python is what it takes, so be it, but I really don't know where to start, and I still hope there is a easier way :)

gerrit
  • 24,025
  • 17
  • 97
  • 170
Centime
  • 113
  • 6
  • 2
    What problem are you trying to solve (that makes you want to overload built-in methods)? – loopbackbee Sep 26 '14 at 14:11
  • possible duplicate of [Overriding special methods on an instance](http://stackoverflow.com/questions/10376604/overriding-special-methods-on-an-instance) – njzk2 Sep 26 '14 at 15:11
  • @goncalopp : What I'm trying to do is to have a running python shell in which the __repr__ called for any string is replaced by a custom method of mine. It doesn't have to be used in qny program that will have to run on any other python interpreter, so as I said, monkey-patching my own python would be fine, if only I knew the way to do it. – Centime Sep 26 '14 at 16:00
  • @njzk2 : I'm not trying to override the method on the instance, but indeed on the class. I tried with forbiddenfruit because python won't let me do it for a builtin type such as str. – Centime Sep 26 '14 at 16:02
  • once you've called `curse...`, what does `str.__dict__['__repr__']` tells you? – njzk2 Sep 26 '14 at 16:07
  • Also, what does `str.__repr__('foo')` returns? – njzk2 Sep 26 '14 at 16:24
  • I just though of this: isn't `__repr__` only called when you are trying to output something that is not actually an `str`? In the case of an `str`, isn't the object directly outputed? – njzk2 Sep 26 '14 at 16:26
  • str.__dict__['__repr__'] : , and str.__repr__('foo') : 'bar'. Everything seems ok to me... And what do you mean by 'directly outputed' ? I tried with __str__ and it dosen't change anything. – Centime Sep 26 '14 at 16:39
  • Here is what it does with a list instead of str : >>> curse(list, '__repr__', lambda self:'bar') >>> [].__repr__() 'bar' >>> [] [] – Centime Sep 26 '14 at 16:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/62005/discussion-between-centime-and-njzk2). – Centime Sep 26 '14 at 16:45

1 Answers1

7

The first thing to note is that whatever forbiddenfruit is doing, it's not affecting repr at all. This isn't a special case for str, it just doesn't work like that:

import forbiddenfruit

class X:
    repr = None

repr(X())
#>>> '<X object at 0x7f907acf4c18>'

forbiddenfruit.curse(X, "__repr__", lambda self: "I am X")

repr(X())
#>>> '<X object at 0x7f907acf4c50>'

X().__repr__()
#>>> 'I am X'

X.__repr__ = X.__repr__

repr(X())
#>>> 'I am X'

I recently found a much simpler way of doing what forbiddenfruit does thanks to a post by HYRY:

import gc

underlying_dict = gc.get_referents(str.__dict__)[0]
underlying_dict["__repr__"] = lambda self: print("I am a str!")

"hello".__repr__()
#>>> I am a str!

repr("hello")
#>>> "'hello'"

So we know, somewhat anticlimactically, that something else is going on.

Here's the source for builtin_repr:

builtin_repr(PyModuleDef *module, PyObject *obj)
/*[clinic end generated code: output=988980120f39e2fa input=a2bca0f38a5a924d]*/
{
    return PyObject_Repr(obj);
}

And for PyObject_Repr (sections elided):

PyObject *
PyObject_Repr(PyObject *v)
{
    PyObject *res;
    res = (*v->ob_type->tp_repr)(v);
    if (res == NULL)
        return NULL;
}

The important point is that instead of looking up in a dict, it looks up the "cached" tp_repr attribute.

Here's what happens when you set the attribute with something like TYPE.__repr__ = new_repr:

static int
type_setattro(PyTypeObject *type, PyObject *name, PyObject *value)
{
    if (!(type->tp_flags & Py_TPFLAGS_HEAPTYPE)) {
        PyErr_Format(
            PyExc_TypeError,
            "can't set attributes of built-in/extension type '%s'",
            type->tp_name);
        return -1;
    }
    if (PyObject_GenericSetAttr((PyObject *)type, name, value) < 0)
        return -1;
    return update_slot(type, name);
}

The first part is the thing preventing you from modifying built-in types. Then it sets the attribute generically (PyObject_GenericSetAttr) and, crucially, updates the slots.

If you're interested in how that works, it's available here. The crucial points are:

  • It's not an exported function and

  • It modifies the PyTypeObject instance itself

so replicating it would require hacking into the PyTypeObject type itself.

If you want to do so, probably the easiest thing to try would be (temporarily?) setting type->tp_flags & Py_TPFLAGS_HEAPTYPE on the str class. This would allow setting the attribute normally. Of course, there are no guarantees this won't crash your interpreter.

This is not what I want to do (especially not through ctypes) unless I really have to, so I offer you a shortcut.

You write:

Then, how would you do to get the expected behaviour:

>>> 'foo'
'bar'

This is actually quite easy using sys.displayhook:

sys.displayhook is called on the result of evaluating an expression entered in an interactive Python session. The display of these values can be customized by assigning another one-argument function to sys.displayhook.

And here's an example:

import sys

old_displayhook = sys.displayhook
def displayhook(object):
    if type(object) is str:
        old_displayhook('bar')
    else:
        old_displayhook(object)

sys.displayhook = displayhook

And then... (!)

'foo'
#>>> 'bar'

123
#>>> 123

On the philosophical point of why repr would be cached as so, first consider:

1 + 1

It would be a pain if this had to look-up __add__ in a dictionary before calling, CPython is slow as it is, so CPython decided to cache lookups to standard dunder (double underscore) methods. __repr__ is one of those, even if it is less common to need the lookup optimized. This is still useful to keep formatting ('%s'%s) fast.

Community
  • 1
  • 1
Veedrac
  • 58,273
  • 15
  • 112
  • 169
  • This ! Thank you Veedrac ! It's perfect, with lots of explanations and all. I'm only half-way through but It's alreay more than I was hoping for. – Centime Sep 27 '14 at 12:44
  • It [does not work](https://gist.github.com/metaperl/99103cdfbe675afeaa38564ebcea7288) – Terrence Brannon Apr 11 '17 at 10:15
  • @TerrenceBrannon Indeed, and I explain why it doesn't in this answer. It's likely I haven't been sufficiently clear, so if you'd explain why you think I've said otherwise it would really help me to fix up any ambiguities. – Veedrac Apr 11 '17 at 10:24
  • @Veedrac, would you mind having a look at this question, I wonder if what I'm trying to do is possible https://stackoverflow.com/questions/64611050/python-change-exception-printable-output-eg-overload-builtins Thanks – Orsiris de Jong Nov 03 '20 at 10:21