1

From the solution for recursive dictionaries found here, I saw the following Python recipe:

class RecursiveDict(dict):
    """Implementation of perl's autovivification feature."""
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

About the line value = self[key] = type(self)(): Does it have any advantages or disadvantages over the following code? Or is it just syntax?

self[key] = type(self)()
value = self[key]
Community
  • 1
  • 1
MikeRand
  • 4,788
  • 9
  • 41
  • 70
  • I do wonder why anyone would choose to make a variable `value` when they could just return `self[key]`. – Waleed Khan Feb 27 '13 at 22:03
  • 2
    I suppose because the programmer is sick of earlyoptimizationitis and doesn't want to (apparently) access the dictionary 2 times (as in `self[key] = type(self)()` and then `return self[key]`). – fog Feb 27 '13 at 22:04
  • missing method is call only when element is not in dict and fog is right: the element is created and default value assigned and returned. @Waleed: you can't return a = b(or if you can, what you return? The value of attribution? the value of b?) – cox Feb 28 '13 at 03:05

2 Answers2

4

It's mostly syntactic sugar, but not for the statements you list; instead, it's equivalent to

value = type(self)()
self[key] = value

To see the difference, type the following at your Python prompt:

>>> class FakeDict(object):
...  def __setitem__(self, k, v):
...   pass
...  def __getitem__(self, k):
...   raise KeyError("boom!")
... 
>>> d = FakeDict()
>>> x = d[1] = 42
>>> d[1] = 42
>>> x = d[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in __getitem__
KeyError: boom!

Of course, with a well-behaved dict, this won't matter, but

self[key] = type(self)()
value = self[key]

does perform a superfluous dict lookup in the second line so inside a loop the shorthand may give a performance benefit.

In the general case, it's even a bit more complicated than I spelled out above. The shorthand assignment is actually equivalent to

__some_temporary = type(self)()
value = __some_temporary
self[key] = __some_temporary

and the simplified form value = type(self)(); self[key] = value ensues only because value is a simple local variable. If value were replaced by an expression of the form container[key] that might fail, the equivalence would no longer hold.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • If you don't want to call `type()` two times or access the dict two times, what's wrong with the obvious `value = type(self)()`, then `self[key] = value`? – fog Feb 27 '13 at 22:08
  • @fog I'm not sure if that's the actual order of assignment. Will do an experiment to find out. – Fred Foo Feb 27 '13 at 22:09
  • I am pretty sure it is not (optimizations apart I bet the compiler uses a temporary register or variable). When manually converting expressions I just find more useful to present them in a way that makes obvious the side-effects (double calls and so on). – fog Feb 27 '13 at 22:13
  • @fog The CPython compiler hardly optimizes anything away, but it does seem to be the case that `value` is assigned first. Added a note at the end to respond to your remark in more detail. – Fred Foo Feb 27 '13 at 22:14
  • @fog: The CPython compiler doesn't compile anything to native code; it compiles to bytecodes in a stack-machine-based VM. (And the other compilers compile to JVM, CLI, or RPython bytecodes, which are also not native code.) – abarnert Feb 27 '13 at 22:15
  • @larsmans: But I think fog's larger point here is that the order of assignment doesn't matter here, so there is a dead-simple, easily-readable way to get the exact same effect without needing to call `type()` twice or look up the hash value twice. (And from his comment on the question, I'm pretty sure he recognizes that no sane person should care about optimizing those away… but some people do anyway.) – abarnert Feb 27 '13 at 22:16
  • @larsmans: And "a superfluous hash table lookup" effectively resolves to one `%` and one `+` in C, the cost of which is so small that it's pretty much impossible to measure amongst the cost of the rest of this function. – abarnert Feb 27 '13 at 22:20
  • @abarnert: yes, I know how Python bytecode works. :) Anyway your comment explains exactly what I meant. – fog Feb 27 '13 at 22:21
  • @abarnert That depends on the complexity of the key. The actual hash function, and possibly an `__hash__` method, is also called twice. A general-purpose container (which is what this code is part of) shouldn't do that. – Fred Foo Feb 27 '13 at 22:22
  • @larsmans: Ah, you meant the `dict` lookup, not just the `hash` lookup part of it. Well, that usually does involve a function call—but any built-in/stdlib type that's expensive to hash caches the value, and implements the function call in C, and of course uses a slot instead of a member dict, so it's still very little actual work. It's worth avoiding for simplicity and correctness reasons, but for performance, it's almost never going to be make a difference. – abarnert Feb 27 '13 at 22:26
  • I expanded a little bit my answer following this discussion. I hope it will be useful. – fog Feb 27 '13 at 22:33
  • @fog I just simplified my answer by swapping the specific and general cases for clarity. – Fred Foo Feb 27 '13 at 22:35
1

It is just syntax. Sometimes it makes easier to read the code, sometimes it doesn't.

But what happens? The other answer made me curious: I commented to it saying that the compier probably uses a "register or variable" and that's exactly what happens. Let's decompile a simple function:

>>> import dis
>>> def f():
...     a = b = c = 1
...     return a
... 
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (1)
              3 DUP_TOP             
              4 STORE_FAST               0 (a)
              7 DUP_TOP             
              8 STORE_FAST               1 (b)
             11 STORE_FAST               2 (c)

  3          14 LOAD_FAST                0 (a)
             17 RETURN_VALUE        

The right-most value (1 in my case type(self)() in the real case) is loaded onto the stack (think "local variable") then duplicated onto the stack (in a stack-based VM operations consume the stack so if you want to "keep" a value you need to make multiple copies) then assigned, then duplicated, then assigned and so on. I.e., converted back to to Python this is something like:

def f():
    t = 1
    a = t
    b = t
    c = t
    return a

The compiler even preserves the order of the assignment going from left to right (a, b, c).

fog
  • 3,266
  • 1
  • 25
  • 31