9

You can define __slots__ in new-style python classes using either list or tuple (or perhaps any iterable?). The type persists after instances are created.

Given that tuples are always a little more efficient than lists and are immutable, is there any reason why you would not want to use a tuple for __slots__?

>>> class foo(object):
...   __slots__ = ('a',)
... 
>>> class foo2(object):
...   __slots__ = ['a']
... 
>>> foo().__slots__
('a',)
>>> foo2().__slots__
['a']
ʞɔıu
  • 47,148
  • 35
  • 106
  • 149
  • As a site note, I suspect that you, like may people, assume that `__slots__` somehow gives you something more like a C struct under the covers, which is much faster and more compact, and that may even be why you're using it in the first place. If so: slots use descriptors, so accessing `foo.a` basically does `Foo.a.get(foo)` instead of the usual `foo.__dict__['a']`, which is generally _slower_ rather than faster. The advantage is saving a `dict` object for each instance, if you have lots of instances and few attributes (as the docs explicitly say), not efficiency. – abarnert Jan 03 '14 at 23:37
  • FYI, in Python 3.3 there is a new `dict` implementation (see [PEP412](http://www.python.org/dev/peps/pep-0412/)) which seems to make `__slots__` redundant. See [this question](http://stackoverflow.com/questions/13761423/does-pep-412-make-slots-redundant) for more info. – aquavitae Jan 28 '14 at 06:21
  • @aquavitae: I think the answer there is wrong; `__slots__` is still not redundant for most of the use cases where it's worth using. – abarnert Aug 03 '14 at 05:09

2 Answers2

5

First, tuples aren't any more efficient than lists; they both support the exact same fast iteration mechanism from C API code, and use the same code for both indexing and iterating from Python.

More importantly, the __slots__ mechanism doesn't actually use the __slots__ member except during construction. This may not be that clearly explained by the documentation, but if you read all of the bullet points carefully enough the information is there.

And really, it has to be true. Otherwise, this wouldn't work:

class Foo(object):
    __slots__ = (x for x in ['a', 'b', 'c'] if x != 'b')

… and, worse, this would:

slots = ['a', 'b', 'c']
class Foo(object):
    __slots__ = slots
foo = Foo()
slots.append('d')
foo.d = 4

For further proof:

>>> a = ['a', 'b']
>>> class Foo(object):
...     __slots__ = a
>>> del Foo.__slots__
>>> foo = Foo()
>>> foo.d = 3
AttributeError: 'Foo' object has no attribute 'd'
>>> foo.__dict__
AttributeError: 'Foo' object has no attribute '__dict__'
>>> foo.__slots__
AttributeError: 'Foo' object has no attribute '__slots__'

So, that __slots__ member in Foo is really only there for documentation and introspection purposes. Which means there is no performance issue, or behavior issue, just a stylistic one.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 2
    "First, tuples aren't any more efficient than lists" -- playing around with timeit disproves this – ʞɔıu Jan 03 '14 at 23:48
  • 1
    @ʞɔıu: What are you doing with them, and what are you testing? `ll=[randrange(100) for _ in range(10000)]; tt=tuple(ll)` followed by `%timeit ll[-100]` and `%timeit tt[-100]` has tuple winning 4 times and losing 2 times, but never by more than 3.5% either way. – abarnert Jan 03 '14 at 23:49
  • 2
    `timeit.Timer("for i in x: pass", "x = (1, 2, 3, 4, 5, 6)")` is consistently about 20% faster than `timeit.Timer("for i in x: pass", "x = [1, 2, 3, 4, 5, 6]")` on my system – ʞɔıu Jan 03 '14 at 23:53
  • 1
    @ʞɔıu: With your exact code, on my linux box, I get 2.8% faster in 3.2.3, 3.2% slower in 2.7.3, 0.6% slower in 2.7.6 on one computer; on my Mac, 1.9% slower in 3.4b1, 1.3% slower in 3.3.2, 2.9% faster in 2.7.5. (All 64-bit CPython.) But the performance of a repeated loop over the same tiny sequence is unlikely to be meaningful in any real-world code anyway. (In real life, you're not going to have the whole sequence in cache for 99.99999% of the runs, for example.) – abarnert Jan 04 '14 at 00:16
  • 1
    @abarnert I get the same result of his first comment by using `%timeit` – Nava Aug 01 '14 at 05:44
  • Any claim that tuples are _not_ appreciably more efficient in both space and time than lists is spurious at best and dubious at worst. In particular, **tuple assignment is well-known to be an [order of magnitude](https://stackoverflow.com/a/68712/2809027) faster than list assignment.** This finding is _more-or-less_ invariant across platform and hardware. Why? Because tuple constants are stored directly in bytecode, whereas list constants are not. Cue ludicrous speed. – Cecil Curry Aug 05 '16 at 05:06
  • Likewise, the claim that "there is no performance issue" is equally erroneous. Use of `__slots__` versus `__dict__` typically increases **(A)** time performance by anywhere from 5% to 10% (_see [this](https://stackoverflow.com/a/14119024/2809027) and [this](https://stackoverflow.com/a/1336890/2809027)_) and **(B)** space performance by anywhere from ∞% to ∞^∞% (_see [this](http://tech.oyster.com/save-ram-with-python-slots)_) – which is to say, by several orders of magnitude. **This entire answer is frankly fallacious.** Your answer is bad and you should feel bad. – Cecil Curry Aug 05 '16 at 05:22
0

According to the Python docs..

This class variable can be assigned a string, iterable, or sequence of strings with variable names used by instances.

So, you can define it using any iterable. Which one you use is up to you, but in terms of which to "prefer", I would use a list.

First, let's look at what would be the preferred choice if performance were not an issue, which would mean it would be the same decision you would make between list and tuples in all Python code. I would say a list, and the reason is because a tuple is design to have semantic structure: it should semantically mean something that you stored an element as the first item rather than the second. For example, if you stored the first value of an (X,Y) coordinate tuple (the X) as the second item, you just completely changed the semantic value of the structure. If you rearrange the names of the attributes in the __slots__ list, you haven't semantically changed anything. Therefore, in this case, you should use a list.

Now, about performance. First, this is probably premature optimization. I don't know about the performance difference between lists and tuples, but I would guess there isn't anyway. But even assuming there is, it would really only come into play if the __slots__ variable is accessed many times.

I haven't actually looked at the code for when __slots__ is accessed, but I ran the following test..

print('Defining slotter..')
class Slotter(object):
    def __iter__(self):
        print('Looking for slots')
        yield 'A'
        yield 'B'
        yield 'C'

print('Defining Mine..')
class Mine(object):
    __slots__ = Slotter()

print('Creating first mine...')
m1 = Mine()
m1.A = 1
m1.B = 2

print('Creating second mine...')
m2 = Mine()
m2.A = 1
m2.C = 2

Basically, I use a custom class so that I can see exactly when the slots variable is actually iterated. You'll see that it is done exactly once, when the class is defined.

Defining slotter..
Defining Mine..
Looking for slots
Creating first mine...
Creating second mine...

Unless there is a case that I'm missing where the __slots__ variable is iterated again, I think that the performance difference can be declared negligible at worst.

Community
  • 1
  • 1
Mark Hildreth
  • 42,023
  • 11
  • 120
  • 109
  • is there a way to get the list of attributes of an object with slots without accessing `__slots__` again? – ʞɔıu Jan 03 '14 at 23:50
  • re: "a tuple is design to have semantic structure: it should semantically mean something that you stored an element as the first item rather than the second" -- I would argue that this is a subset of the reasons why you would want use a tuple, not the complete set. Another reason is mutability, and there is definitely no reason for `__slots__` to be mutable. – ʞɔıu Jan 04 '14 at 00:08
  • 2
    It seems completely bogus to me. `list` and `tuple` are both sequences, the order of both can be significant. If you really want a container that semantically implies order is insignificant, I suppose you could use `set` given that `__slots__` doesn't need to have duplicates. Even if you accept the homogeneous vs heterogeneous distinction (which IMO is a side-effect of variable-size vs fixed-size, but let's not go there), it doesn't follow that `list` implies *unordered* data. – Steve Jessop Jan 04 '14 at 00:13
  • 1
    As my answer explains, your actual slots aren't going to be mutable no matter what iterable you pass in. If using `tuple` misleads you into thinking that there's a difference that isn't there, that's reason enough to avoid it… @SteveJessop: I've actually used `set` a few times, and I think it reads pretty nicely that way, but I don't feel strongly enough to argue for it. – abarnert Jan 04 '14 at 00:18
  • @abarnert: yeah, if not for set literals I don't think `set` would even be a contender :-) – Steve Jessop Jan 04 '14 at 00:20
  • @SteveJessop: Sure, but we've had set literals (and comprehensions) since… 3.1 and 2.7? That's been more that enough time for me to get used to them, and once you do, a lot of things just look more right as a set than a list because you don't care about the order, or duplicates would be clearly an error, or you're modeling something that's mathematically set-like rather than sequence-like, etc. I think that may be the case here. There is no order to the actual descriptors that get created, duplicates would be an error, and in type-theory terms you're definitely defining a set… – abarnert Jan 04 '14 at 00:35
  • I would also accept using sets for `__slots__`. I will admit that I did a horrible job explaining why one would use a tuple rather than a list, I felt that the answer I pointed to did a good job of that. – Mark Hildreth Jan 06 '14 at 15:16
  • @abarnert: "duplicates would be clearly an error" -- if you have a *lot* of such uses then I think there's a case for a function that checks they're non-duplicate. For example `def nodupes(*args): result = set(args); assert(len(result) == len(args)); return result`. Or for the "mad plans" file, another literal+comprehension syntax: the more the merrier right? ;-) – Steve Jessop Jan 07 '14 at 00:14
  • @SteveJessop: Your function doesn't work with iterators; you need to keep track of the count as you go along. If only comprehensions still leaked variables to the outer scope, you could just use `enumerate` to do that efficiently. Of course I can't imagine when you'd ever want to create slots from an iterator _and_ verify no dups _and_ care about efficiency, but if so… any suggestions for your mad plan syntax? I think we're out of symbols unless you want to require APL keyboards, so… `{distinct int(x) for x in it}`? – abarnert Jan 07 '14 at 01:09
  • @abarnert: "iterable": it's just an example of one such function that might be useful. Taking `*args` is so it's called as `nodupes(1,2,3)` rather than `nodupes((1,2,3))`. "mad plan": Does CPython's parser easily admit context-sensitive keywords? If not I wouldn't want to have to make `distinct` a reserved word *everywhere* just to give it meaning there. I wonder can we unambiguously make `distinct_{` a token? Opens the door to a rich new world of literals for minor use cases. – Steve Jessop Jan 07 '14 at 01:19
  • Ah, curses. Even if `distinct_{` is OK for the literal syntax, the corresponding lazily-evaluated generator can't be `distinct_(` because it clashes with a function named `distinct_`. I'll take your syntax provided we don't end up writing Python that looks too much like SQL. – Steve Jessop Jan 07 '14 at 01:25
  • @SteveJessop: I'm pretty sure someone suggested adding a context-sensitive keyword in some previous PEP and Guido said it would only happen over his dead, re-animated, and re-killed body. Anyway, the tokenizer will generate a NAME for `distinct_` followed by an OP for `{`, the same way it tokenizes `foo[3]`, and I think that would be hard to change… but I think the parser could handle those two tokens specially. Although I guess doing so would mean any name handled specially is effectively a context-sensitive keyword, so… – abarnert Jan 07 '14 at 01:56
  • @abarnert: ah, I was thinking of disallowing space between the `_` and the `{`, so that it would be tokenized as a brand new kind of open-brace: a workaround for the fact that Python has already used all four/five open-close pairs of characters in ASCII. So `distinct_` wouldn't be a context-sensitive keyword as far as (any part of) the grammar is concerned, although GvR may of course still regard it morally as that. – Steve Jessop Jan 07 '14 at 02:04
  • @SteveJessop: Right, but that would require either (a) changing how the tokenizer handles bracketed expressions in general, (b) making curly braces different from other kinds of brackets, or (c) making name/identifier tokens look ahead, none of which seem feasible. My alternative to fix it up at the parser sounded much nicer when I started typing it, until I got to the end and realized I was suggesting a context-sensitive keyword to avoid needing a context-sensitive keyword. – abarnert Jan 07 '14 at 02:26
  • @abarnert: OK, so all solutions are horrible. We can expect no new kinds of literal prior to a *major* overhaul of Python. I haven't looked at the Python tokenizer, but I'm kind of scared to now, if it's difficult to add a new token type to it whose tokens just match an exact string. Still, I guess an obfuscated tokenizer is a great way to stop people adding new reserved words willy-nilly ;-) – Steve Jessop Jan 07 '14 at 02:30