Reason for allowing Special Characters in Python Attributes

Question

I somewhat accidentally discovered that you can set 'illegal' attributes to an object using setattr. By illegal, I mean attributes with names that can't be retrieve using the __getattr__ interface with traditional . operator references. They can only be retrieved via the getattr method.

This, to me, seems rather astonishing, and I'm wondering if there's a reason for this, or if it's just something overlooked, etc. Since there exists an operator for retrieving attributes, and a standard implementation of the setattribute interface, I would expect it to only allow attribute names that can actually be retrieved normally. And, if you had some bizarre reason to want attributes that have invalid names, you would have to implement your own interface for them.

Am I alone in being surprised by this behavior?

class Foo:
    "stores attrs"

foo = Foo()
setattr(foo, "bar.baz", "this can't be reached")
dir(foo)

This returns something that is both odd, and a little misleading: [...'__weakref__', 'bar.baz']

And if I want to access foo.bar.baz in the 'standard' way, I cannot. The inability to retrieve it makes perfect sense, but the ability to set it is surprising.

foo.bar.baz
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'bar'

Is it simply assumed that, if you have to use setattr to set the variable, you are going to reference it via getattr? Because at runtime, this may not always be true, especially with Python's interactive interpreter, reflection, etc. It still seems very odd that this would be permitted by default.

EDIT: An (very rough) example of what I would expect to see as the default implementation of setattr:

import re

class Safe:
    "stores attrs"

    def __setattr__(self, attr, value):
        if not re.match(r"^\w[\w\d\-]+$", attr):
            raise AttributeError("Invalid characters in attribute name")
        else:
            super().__setattr__(attr, value)

This will not permit me to use invalid characters in my attribute names. Obviously, super() could not be used on the base Object class, but this is just an example.

What version of Python are you using? You can also use `foo.__dict__['bar.baz']`... — Charles D Pantoga, Jul 08 '16 at 19:05
Python 3.5.0. Thanks for the alternative method. Is there any benefit to that method over getattr? Or just preference? — Keozon, Jul 08 '16 at 19:15
@Keozon -- `getattr` is recommended for normal use. It'll work for objects that don't have a `__dict__` for example. Such objects are rare (because their creation is generally discouraged), but they do exist and are useful for some purposes. — mgilson, Jul 08 '16 at 19:18

score 5 · Accepted Answer · answered Jul 08 '16 at 19:16

5

I think that your assumption that attributes must be "identifiers" is incorrect. As you've noted, python objects support arbitrary attributes (not just identifiers) because for most objects, the attributes are stored in the instance's __dict__ (which is a dict and therefore supports arbitrary string keys). However, in order to have an attribute access operator at all, the set of names that can be accessed in that way needs to be restricted to allow for the generation of a syntax that can parse it.

Is it simply assumed that, if you have to use setattr to set the variable, you are going to reference it via getattr?

No. I don't think that's assumed. I think that the assumption is that if you're referencing attributes using the . operator, then you know what those attributes are. And if you have the ability to know what those attributes are, then you probably have control over what they're called. And if you have control over what they're called, then you can name them something that the parser knows how to handle ;-).

answered Jul 08 '16 at 19:16

mgilson

300,191
65
633
696

1

Keys in a `dict` needn't be strings at all, you can mix and match key types in the same dictionary if you want to. – Mark Ransom Jul 08 '16 at 19:24
@MarkRansom -- Right. I wasn't trying to imply that `dict` can only hold strings. I only meant that if it is a string, a `dict` can hold it and `setattr` _does_ prevent non-strings from being set as attributes. – mgilson Jul 08 '16 at 19:26
@mgilson Thanks. You are right about my assumption, and probably right about the reason. What I struggle with is that in my opinion, the thing that makes python such a great language is how every operator is implemented via some interface(s). As a result of this opinion, I tend to think of the operators as the most elegant, or 'standard' way of leveraging an interface... so the 'standard' implementation of an interface should comply with the limitations of the operator. I will have to think about this. – Keozon Jul 08 '16 at 19:54

score 2 · Answer 2 · answered Oct 07 '16 at 20:40

I see that feature of the language as an unintended side-effect of how the language is implemented.

There are several issues which suggest the feature is a side-effect.

First, from the "Zen of Python":

There should be one-- and preferably only one --obvious way to do it.

For me, the obvious way to access an attribute is with . operator. Thus, I consider names incompatible with the operator illegal as they require "hacks" to use them.

Second, despite we can have integer key in the instance's __dict__ (as pointed by Mark Ransom), I do not consider int to be a valid attribute name. Especially that it breaks the object behaviour:

>>> a.__dict__[12] = 42
>>> dir(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()

Third, it is not completely true what Python documentation claims about the . operator and the getattr() builtin equivalence. The difference is in the resulting bytecode. The former compiles to LOAD_ATTR bytecode, while the latter - to CALL_FUNCTION:

>>> dis.dis(lambda x: x.a)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_ATTR                0 (a)
              6 RETURN_VALUE
>>> dis.dis(lambda x: getattr(x, 'a'))
  1           0 LOAD_GLOBAL              0 (getattr)
              3 LOAD_FAST                0 (x)
              6 LOAD_CONST               1 ('a')
              9 CALL_FUNCTION            2 (2 positional, 0 keyword pair)
         12 RETURN_VALUE

Same applies to the setattr() builtin. Thus, I see builtins as a kind of walkarround introduced to facilitate dynamic attribute access (the builtin was absent in Python 0.9.1).

Finally, the following code (declaring __slots__ attributes) fails:

>>> class A(object):
...     __slots__ = ['a.b']
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __slots__ must be identifiers

which suggests that attribute names are supposed to be identifiers.

However, since I can not find any formal syntax for allowed attribute names, I also see the point raised by @mgilson valid.

Reason for allowing Special Characters in Python Attributes

2 Answers2