0

I understand that __dict__ in obj.__dict__ is a descriptor attribute of type(obj), so the lookup for obj.__dict__ is type(obj).__dict__['__dict__'].__get__(obj).

From https://stackoverflow.com/a/46576009

It's tempting to say that __dict__ has to be a descriptor because implementing it as a __dict__ entry would require you to find the __dict__ before you can find the __dict__, but Python already bypasses normal attribute lookup to find __dict__ when looking up other attributes, so that's not quite as compelling as it initially sounds. If the descriptors were replaced with a '__dict__' key in every __dict__, __dict__ would still be findable.

How does "Python already bypasses normal attribute lookup to find __dict__"? What does "normal attribute lookup" mean?

According to the context of the quote in the link, I don't think when the author wrote that, he referred to that the lookup for obj.__dict__ is type(obj).__dict__['__dict__'].__get__(obj).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Tim
  • 1
  • 141
  • 372
  • 590
  • 3
    I've noticed @Tim that that you often add a salutatory remark to your question. While this is certainly consider polite when speaking with someone face to face, it only adds noise to posts on Stack Overflow. Please see [_Why are fellow users removing thank-you's from my questions?_](https://meta.stackoverflow.com/questions/328379/why-are-fellow-users-removing-thank-yous-from-my-questions) for further details – Christian Dean Oct 05 '17 at 16:47
  • 1
    You may find this of interest: https://stackoverflow.com/questions/39379351/changing-getattr-during-instantiation – PM 2Ring Oct 05 '17 at 16:52

1 Answers1

5

Normal attribute lookup is done by calling the __getattribute__ hook, or more precisely, the C-API tp_getattro slot. The default implementation for this is in the PyObject_GenericGetAttr C-API function.

It is the job of PyObject_GenericGetAttr to invoke descriptors if they exist, and to look at the instance __dict__. And indeed, there is a __dict__ descriptor, but it is faster for __getattribute__ to just access the __dict__ slot in the instance memory structure directly, and that is what the actual implementation does:

if (dict == NULL) {
    /* Inline _PyObject_GetDictPtr */
    dictoffset = tp->tp_dictoffset;
    if (dictoffset != 0) {
        if (dictoffset < 0) {
            Py_ssize_t tsize;
            size_t size;

            tsize = ((PyVarObject *)obj)->ob_size;
            if (tsize < 0)
                tsize = -tsize;
            size = _PyObject_VAR_SIZE(tp, tsize);
            assert(size <= PY_SSIZE_T_MAX);

            dictoffset += (Py_ssize_t)size;
            assert(dictoffset > 0);
            assert(dictoffset % SIZEOF_VOID_P == 0);
        }
        dictptr = (PyObject **) ((char *)obj + dictoffset);
        dict = *dictptr;
    }
}

Note the Inline _PyObject_GetDictPtr comment; this is a performance optimisation, as instance attribute lookups are frequent.

If you try to access instance.__dict__ from Python code, then the descriptor is invoked; it is a data descriptor object so is invoked before instance attributes are even looked at.

sanyassh
  • 8,100
  • 13
  • 36
  • 70
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks. Does the lookup process for `__dict__` also apply to other attributes whose names both begin and end with two underscores, such as `__name__`, `__class__`, `__new__`, `__init__`, `__str__`, `__repr__`? – Tim Oct 05 '17 at 20:48
  • [will the lookup for self-defined attributes whose names both begin and end with two underscores (e.g. `__madeupAttribute__`) be the same as the lookup for those special attributes?](https://stackoverflow.com/q/46594480) – Tim Oct 05 '17 at 20:56
  • @Tim: no, this isn't about special names, that `__dict__` is a special name has nothing to do with it. The attribute lookup process needs access to that specific object *every time you look up an attribute that is not a data descriptor*, so the access path is optimised. – Martijn Pieters Oct 05 '17 at 21:26
  • 1
    @Tim: Special names are subject to [different lookup rules](https://docs.python.org/3/reference/datamodel.html#special-lookuphttps://docs.python.org/3/reference/datamodel.html#special-lookup), but that happens in the code trying to access them. For example, the `hash()` function will explicitly use `type(obj).__hash__(obj)` rather than `obj.__hash__()`. It is the responsibility of the code accessing special attributes to treat them special. – Martijn Pieters Oct 05 '17 at 21:28
  • @Tim: double-underscore names are "reserved", by convention only, see the [*Reserved classes of identifiers* section](https://docs.python.org/3/reference/lexical_analysis.html#reserved-classes-of-identifiers) of the lexical analysis documentation. You are free to make up your own, but you can't count on forward compatibility. If you want to use `__madeupAttribute__` and a future Python version wants to assign special meaning to that name, then your code could break on that Python version. – Martijn Pieters Oct 05 '17 at 21:31
  • Thanks. "And indeed, there is a `__dict__` descriptor, but it is faster for `__getattribute__` to just access the `__dict__` slot in the instance memory structure directly". Do you mean that there is a `__dict__` descriptor as an attribute of the class, and there is another different `__dict__` non-descriptor attribute of the instance? The lookup of `instance.__dict__` isn't by checking the descriptor attribute of `type(instance)`, but by looking directly at the `__dict__` non-descriptor attribute of the instance? – Tim Oct 05 '17 at 23:27
  • 1
    @Tim: no, I don't. In the end, all your data lives in different places in memory. The goal of `__getattribute__` is to get the right memory location for the instance attribute dictionary (the object that `instance.__dict__` returns), so they need a *pointer*. That pointer is stored in the memory block reserved for the instance data. `__getattribute__` could get that pointer by using the `__dict__` descriptor object; that would make the C code nice and simple. But that would not be as fast as just going to the instance memory location directly, which is what the code I linked to does. – Martijn Pieters Oct 06 '17 at 07:53
  • 1
    @Tim: so rather than do `__getattribute__` -> `class.__dict__` -> *find descriptor object associated with the `__dict__` key* -> `descriptor_object.__get__(instance, class)` -> *return slot offset for instance memory region*, the much shorter path `__getattribute__` -> *slot offset for instance memory region* is used. – Martijn Pieters Oct 06 '17 at 07:56
  • Thanks. Is it correct that the lookup for `instance.__dict__` abandons the descriptor attribute `__dict__` of `type(instance)`? Why is `instance.__dict__` said to be the descriptor attribute of `type(instance)`, and if so, what is the purpose? – Tim Oct 06 '17 at 11:54
  • @Tim: not sure what you mean there. The [descriptor protocol](https://docs.python.org/3/howto/descriptor.html) only ever looks at descriptors on `type(instance)`. Since `__dict__` is a *data descriptor* the code will never look at instance attributes. – Martijn Pieters Oct 06 '17 at 12:41
  • Sorry that I wasn't explicit. Does this lookup process "the much shorter path `__getattribute__` -> slot offset for instance memory region is used" not make any use of the descriptor attribute `__dict__` of `type(instance)` at all? If that is correct, why is `instance.__dict__` said to be the descriptor attribute of `type(instance)`? – – Tim Oct 06 '17 at 12:46
  • @Tim: yes, that's correct. But `__getattribute__` is not the *only user of `__dict__`*. Python code that wants to access `instance.__dict__` will still want to be able to use it, and that's where the descriptor comes in. – Martijn Pieters Oct 06 '17 at 12:54
  • Thanks. When `instance.__dict__` appears in a Python program, is `__getattribute__` always implicitly called to look up the attribute `__dict__`? If not, when is `__getattribute__` called implicitly, and when is it not? – Tim Oct 06 '17 at 13:06
  • 1
    @Tim: `__getattribute__` is called, because there is a `.` attribute lookup there. But the descriptor is found before `__getattribute__` needs to look at instance attributes, so it is used and the result returned. If you used `instance.some_other_name` and `some_other_name` is not found on the class or not a data descriptor, *then* `__getattribute__` will get the instance attribute dictionary directly from the pointer. – Martijn Pieters Oct 06 '17 at 13:42