6

How the methods invocation works in Python? I mean, how the python virtual machine interpret it.

It's true that the python method resolution could be slower in Python that in Java. What is late binding?

What are the differences on the reflection mechanism in these two languages? Where to find good resources explaining these aspects?

Andrea Francia
  • 9,737
  • 16
  • 56
  • 70

3 Answers3

8

Method invocation in Python consists of two distinct separable steps. First an attribute lookup is done, then the result of that lookup is invoked. This means that the following two snippets have the same semantics:

foo.bar()

method = foo.bar
method()

Attribute lookup in Python is a rather complex process. Say we are looking up attribute named attr on object obj, meaning the following expression in Python code: obj.attr

First obj's instance dictionary is searched for "attr", then the instance dictionary of the class of obj and the dictionaries of its parent classes are searched in method resolution order for "attr".

Normally if a value is found on the instance, that is returned. But if the lookup on the class results in a value that has both the __get__ and __set__ methods (to be exact, if a dictionary lookup on the values class and parent classes has values for both those keys) then the class attribute is regarded as something called a "data descriptor". This means that the __get__ method on that value is called, passing in the object on which the lookup occurred and the result of that value is returned. If the class attribute isn't found or isn't a data descriptor the value from the instances dictionary is returned.

If there is no value in the instance dictionary, then the value from the class lookup is returned. Unless it happens to be a "non-data descriptor", i.e. it has the __get__ method. Then the __get__ method is invoked and the resulting value returned.

There is one more special case, if the obj happens to be a class, (an instance of the type type), then the instance value is also checked if it's a descriptor and invoked accordingly.

If no value is found on the instance nor its class hierarchy, and the obj's class has a __getattr__ method, that method is called.

The following shows the algorithm as encoded in Python, effectively doing what the getattr() function would do. (excluding any bugs that have slipped in)

NotFound = object() # A singleton to signify not found values

def lookup_attribute(obj, attr):
    class_attr_value = lookup_attr_on_class(obj, attr)

    if is_data_descriptor(class_attr_value):
        return invoke_descriptor(class_attr_value, obj, obj.__class__)

    if attr in obj.__dict__:
        instance_attr_value = obj.__dict__[attr]
        if isinstance(obj, type) and is_descriptor(instance_attr_value):
            return invoke_descriptor(instance_attr_value, None, obj)
        return instance_attr_value

    if class_attr_value is NotFound:
        getattr_method = lookup_attr_on_class(obj, '__getattr__')
        if getattr_method is NotFound:
            raise AttributeError()
        return getattr_method(obj, attr)

    if is_descriptor(class_attr_value):
        return invoke_descriptor(class_attr_value, obj, obj.__class__)

    return class_attr_value

def lookup_attr_on_class(obj, attr):
    for parent_class in obj.__class__.__mro__:
        if attr in parent_class.__dict__:
            return parent_class.__dict__[attr]
    return NotFound

def is_descriptor(obj):
    if lookup_attr_on_class(obj, '__get__') is NotFound:
        return False
    return True

def is_data_descriptor(obj):
    if not is_descriptor(obj) or lookup_attr_on_class(obj, '__set__') is NotFound :
        return False
    return True

def invoke_descriptor(descriptor, obj, cls):
    descriptormethod = lookup_attr_on_class(descriptor, '__get__')
    return descriptormethod(descriptor, obj, cls)

What does all this descriptor nonsense have to with method invocation you ask? Well the thing is, that functions are also objects, and they happen to implement the descriptor protocol. If the attribute lookup finds a function object on the class, it's __get__ methods gets called and returns a "bound method" object. A bound method is just a small wrapper around the function object that stores the object that the function was looked up on, and when invoked, prepends that object to the argument list (where usually for functions that are meant to methods the self argument is).

Here's some illustrative code:

class Function(object):
    def __get__(self, obj, cls):
        return BoundMethod(obj, cls, self.func)
    # Init and call added so that it would work as a function
    # decorator if you'd like to experiment with it yourself
    def __init__(self, the_actual_implementation):
        self.func = the_actual_implementation
    def __call__(self, *args, **kwargs):
        return self.func(*args, **kwargs)

class BoundMethod(object):
    def __init__(self, obj, cls, func):
        self.obj, self.cls, self.func = obj, cls, func
    def __call__(self, *args, **kwargs):
        if self.obj is not None:
             return self.func(self.obj, *args, **kwargs)
        elif isinstance(args[0], self.cls):
             return self.func(*args, **kwargs)
        raise TypeError("Unbound method expects an instance of %s as first arg" % self.cls)

For method resolution order (which in Python's case actually means attribute resolution order) Python uses the C3 algorithm from Dylan. It is too complicated to explain here, so if you're interested see this article. Unless you are doing some really funky inheritance hierarchies (and you shouldn't), it is enough to know that the lookup order is left to right, depth first, and all subclasses of a class are searched before that class is searched.

Ants Aasma
  • 53,288
  • 15
  • 90
  • 97
  • Great answer and +1 for the example code. Can you discuss the uses of is_descriptor and is_data_descriptor? – Hernan Oct 07 '11 at 13:31
4

Names (methods, functions, variables) are all resolved by looking at the namespace. Namespaces are implemented in CPython as dicts (hash maps).

When a name is not found in the instance namespace (dict), python goes for the class, and then for the base classes, following the method resolution order (MRO).

All resolving is made at runtime.

You can play around with the dis module to see how that happens in bytecode.

Simple example:

import dis
a = 1

class X(object):
    def method1(self):
        return 15

def test_namespace(b=None):
    x = X()
    x.method1()
    print a
    print b

dis.dis(test_namespace)

That prints:

  9           0 LOAD_GLOBAL              0 (X)
              3 CALL_FUNCTION            0
              6 STORE_FAST               1 (x)

 10           9 LOAD_FAST                1 (x)
             12 LOAD_ATTR                1 (method1)
             15 CALL_FUNCTION            0
             18 POP_TOP             

 11          19 LOAD_GLOBAL              2 (a)
             22 PRINT_ITEM          
             23 PRINT_NEWLINE       

 12          24 LOAD_FAST                0 (b)
             27 PRINT_ITEM          
             28 PRINT_NEWLINE       
             29 LOAD_CONST               0 (None)
             32 RETURN_VALUE        

All LOADs are namespace lookups.

nosklo
  • 217,122
  • 57
  • 293
  • 297
1

It's true that the python method resolution could be slower in Python that in Java. What is late binding?

Late binding describes a strategy of how an interpreter or compiler of a particular language decides how to map an identifier to a piece of code. For example, consider writing obj.Foo() in C#. When you compile this, the compiler tries to find the referenced object and insert a reference to the location of the Foo method that will be invoked at runtime. All of this method resolution happens at compile time; we say that names are bound "early".

By contrast, Python binds names "late". Method resolution happens at run time: the interpreter simply tries to find the referenced Foo method with the right signature, and if it's not there, a runtime error occurs.

What are the differences on the reflection mechanism in these two languages?

Dynamic languages tend to have better reflection facilities than static languages, and Python is very powerful in this respect. Still, Java has pretty extensive ways to get at the internals of classes and methods. Nevertheless, you can't get around the verbosity of Java; you'll write much more code to do the same thing in Java than you would in Python. See the java.lang.reflect API.

John Feminella
  • 303,634
  • 46
  • 339
  • 357