Why do Python descriptors copy?

Question

(I edited the question, since I think it is still basically the same thing I'm asking, though I gained some understanding from the comments. I don't know if that's permitted, or I should have asked a new one.)

The following code

class A: c = lambda:0
a = A()
print(a.c is a.c)

prints False. I have learned it's because Python thinks A.c is a method, since c is assigned a function at class level. I have two questions:

(less important) How does Python decide whether something is a function? I thought it would have to be def'd explicitly if it were to become a method. "Arbitrary callable" obviously isn't the criterion: for example, builtin functions aren't accepted.
(more important) I learned that "Whenever you look up a method via class.name or instance.name, the method object is created a-new". Is there any implementation-independent reason why is it so? That is, is there any language feature that wouldn't work right if copies weren't made? (Of course, I know that a1.c is not a2.c, but for same object a, could a.c always be the same object? Or at least, could A.c always be the same object?)

@IgnacioVazquez-Abrams ok, sorry. I thought whatever governs attribute access in nontrivial way could be called descriptors. Still, the question remains: what's going on and how to explain it? — Veky, Oct 26 '14 at 04:44
possible duplicate of [python bound and unbound method object](http://stackoverflow.com/questions/13348031/python-bound-and-unbound-method-object) — Ignacio Vazquez-Abrams, Oct 26 '14 at 04:44
So, there are descriptors after all. :-) Anyway, yes, that answers some questions (if you assume access to c goes via descriptor and access to b doesn't), but raises two new ones: why would c go through a decriptor and b not, and why does descriptor protocol copy things it retrieves? — Veky, Oct 26 '14 at 04:59
`lambda` creates a function. Functions assigned at class level become methods. — Ignacio Vazquez-Abrams, Oct 26 '14 at 05:06
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/63653/discussion-between-veky-and-ignacio-vazquez-abrams). — Veky, Oct 26 '14 at 05:08
@IgnacioVazquez-Abrams: "Functions assigned at class level become methods." Not necessarily. If you try to evaluate `A.c` in Python 2.x, it evaluates to an "unbound method" object, which enforces that when called, the first argument is an instance of `A`. If you try to evaluate `A.c` in Python 3.x, you get the original `c` back. — newacct, Oct 28 '14 at 01:44

jfs · Answer 1 · 2014-10-26T08:36:53.030

The default function.__get__ method does "copy" (creates new method instance):

/* Bind a function to an object */
static PyObject *
func_descr_get(PyObject *func, PyObject *obj, PyObject *type)
{
    if (obj == Py_None || obj == NULL) {
        Py_INCREF(func);
        return func;
    }
    return PyMethod_New(func, obj);
}

but you can define a descriptor that doesn't copy:

from functools import partial

class D:
    def __init__(self, function, cached=False):
        self.function = function
        if cached:
            self.cache = {}
        else:
            self.cache = None

    def __get__(self, instance, klass):
        if instance is None: # C.m
            return self.function
        if self.cache is None: # no cache
            m = partial(self.function, instance)
        else:
            m = self.cache.get(instance)
            if m is None:
                m = self.cache[instance] = partial(self.function, instance)
        m.__self__ = instance
        return m # C().m

class C:
    m = D(print)
    cached = D(print, cached=True)

assert C.m is C.m
assert C.cached is C.cached
c = C()
assert c.m is not c.m
assert c.cached is c.cached

It might be simpler/more efficient to recreate the method each time .__get__() is called than to keep (possibly weakrefed) mapping (instance -> method) and breaking cycles (due to __self__ = instance), to avoid wasting memory.

score 0 · Answer 2 · answered Oct 26 '14 at 04:38

0

I don't have a computer handy. What happens if you type print(id(a.b), id(a.b), id(a.c), id(a.c))? If the second pair differ then separate objects are being created and we don't have a bug.

answered Oct 26 '14 at 04:38

Jack Stout

1,265
3
12
25

Nice catch. It does print `55981320 55981320 5065800 5065800`. So, does the converse of your implication holds? Do we have a bug? :-) – Veky Oct 26 '14 at 04:40
I assume not, but perhaps. I don't often use lambdas and would act on the assumption that my understanding is buggy. – Jack Stout Oct 26 '14 at 04:45
Also, Ignacio Vazquez-Abrams is right. These aren't descriptors. – Jack Stout Oct 26 '14 at 04:47
It's _really_ weird. It seems like Python thinks id shouldn't change, and caches it somehow. id(a.c) == id(a.c) says True (and it's not a fluke, len({id(a.c) for _ in range(99)}) gives 1), but if I evaluate d = id(a.c) and after that d == id(a.c), I get False. So it seems we have, if not a bug, at least a very weird behaviour, and that's beside the original problem (why does a.c make a copy when it does make a copy). :-/ – Veky Oct 26 '14 at 04:51
Ok, reading what @Ignacio refered to, I realized what was going on in the above comment. But still I'd like to know why "Whenever you look up a method via class.name or instance.name, the method object is created a-new". Does it have to be like that for some feature to work, or is it just an implementation artefact (could be otherwise)? – Veky Oct 26 '14 at 05:03

score 0 · Accepted Answer · answered Oct 30 '14 at 08:46

Here's what the Python 2.x language reference says about attributes on instances (also applies to 3.x) (scroll down to "Class instances"):

When an attribute is not found there, and the instance’s class has an attribute by that name, the search continues with the class attributes. If a class attribute is found that is a user-defined function object or an unbound user-defined method object whose associated class is the class (call it C) of the instance for which the attribute reference was initiated or one of its bases, it is transformed into a bound user-defined method object whose im_class attribute is C and whose im_self attribute is the instance.

And for attributes on classes (only in 2.x, but not 3.x) (scroll to "Classes"):

When a class attribute reference (for class C, say) would yield a user-defined function object or an unbound user-defined method object whose associated class is either C or one of its base classes, it is transformed into an unbound user-defined method object whose im_class attribute is C.

So to answer your questions:

How does it decide? The specification specifically says "user-defined function object" or "unbound user-defined method object". So those are the ones that this rule applies to. This is not all callable types. If you scroll up on the same page to the section on callable types, there are many types of callable types, of which "User-defined functions" is only one type.
(First, a correction here: for attribute access on instances, yes, a method object is created; for attribute access on classes, a method object is only created on Python 2.x -- in Python 3.x you simply get whatever you put there back, without any wrapping.) To answer your question, presumably a Python implementation could return the same method object. It would require some kind of caching or interning to do this, which has overhead in terms of storage. The specification doesn't say anything about this. And the current CPython implementation does not return the same object. You should not depend on it either way.

Why do Python descriptors copy?

3 Answers3