Why is early return slower than else?

Question

This is a follow-up question to an answer I gave a few days back. Edit: it seems that the OP of that question already used the code I posted to him to ask the same question, but I was unaware of it. Apologies. The answers provided are different though!

Substantially I observed that:

>>> def without_else(param=False):
...     if param:
...         return 1
...     return 0
>>> def with_else(param=False):
...     if param:
...         return 1
...     else:
...         return 0
>>> from timeit import Timer as T
>>> T(lambda : without_else()).repeat()
[0.3011460304260254, 0.2866089344024658, 0.2871549129486084]
>>> T(lambda : with_else()).repeat()
[0.27536892890930176, 0.2693932056427002, 0.27011704444885254]
>>> T(lambda : without_else(True)).repeat()
[0.3383951187133789, 0.32756996154785156, 0.3279120922088623]
>>> T(lambda : with_else(True)).repeat()
[0.3305950164794922, 0.32186388969421387, 0.3209099769592285]

...or in other words: having the else clause is faster regardless of the if condition being triggered or not.

I assume it has to do with different bytecode generated by the two, but is anybody able to confirm/explain in detail?

EDIT: Seems not everybody is able to reproduce my timings, so I thought it might be useful to give some info on my system. I'm running Ubuntu 11.10 64 bit with the default python installed. python generates the following version information:

Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2

Here are the results of the disassembly in Python 2.7:

>>> dis.dis(without_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  4     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
>>> dis.dis(with_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  5     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE

there was an identical question on SO I can not find now. They checked generated bytecode and there was one additional step. Differences observed were very dependent on the tester (machine, SO..), sometimes finding only very very small differences. — joaquin, Nov 25 '11 at 15:42
On 3.x, both produce **identical** bytecode save for some unreachable code (`LOAD_CONST(None); RETURN_VALUE` - but as stated, it's never reached) at the end of `with_else`. I highly doubt dead code makes a function faster. Could someone provide a `dis` on 2.7? — , Nov 25 '11 at 15:42
I was not able to reproduce this. Function with `else` and `False` was slowest of them all (152ns). Second fastest was `True` without `else` (143ns) and two other were basically the same (137ns and 138ns). I did not use default parameter and measured it with `%timeit` in iPython. — rplnt, Nov 25 '11 at 15:43
I can't reproduce those timings, sometimes the with_else is faster, sometimes this is the without_else version, it looks like they are pretty similar for me... — Cédric Julien, Nov 25 '11 at 15:56
@CédricJulien I added some infor on my system, if this helps — mac, Nov 25 '11 at 16:01
Added results of disassembly. I'm using Ubuntu 11.10, 64-bit, stock Python 2.7 - same configuration as @mac. I also concur that `with_else` is observably faster. — Chris Morgan, Nov 25 '11 at 16:09
Another fun piece of trivia to remind you of: [-= is generally faster than +=](http://stackoverflow.com/questions/1396564/why-is-subtraction-faster-than-addition-in-python). But when I was comparing results on my machine and my Dad's MacBook, for him += was generally faster... different software or hardware configurations can alter these issues drastically. — Chris Morgan, Nov 25 '11 at 16:16
@mac I was running it on Windows 7 x64 with Python 2.7. The bytecode is the same. — rplnt, Nov 25 '11 at 16:17
Thanks @Chris for the fine addition. If the downvoter is following the question: care to explain how I could improve it? — mac, Nov 25 '11 at 16:22
possible duplicate of [Why does removing the else slow down my code?](http://stackoverflow.com/questions/8203696/why-does-removing-the-else-slow-down-my-code) — joaquin, Nov 25 '11 at 18:13
@joaquin - It seems that the OP of the question to whom I replied snatched the code from my answer and turn it into a question (which is OK) without notifying me (which is not an obligation, but it would have been gracious to do). You can check from yourself: it's just copy-paste. Yet, you are right. It's a duplicate. :( — mac, Nov 25 '11 at 21:22
@mac, I just copied my answer onto the other question (in that case `fact2` has a hash conflict with `__name__`, but I also voted to close it as a duplicate). — Duncan, Nov 28 '11 at 09:00

Duncan · Accepted Answer · 2017-01-22T19:39:47.600

This is a pure guess, and I haven't figured out an easy way to check whether it is right, but I have a theory for you.

I tried your code and get the same of results, without_else() is repeatedly slightly slower than with_else():

>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]

Considering that the bytecode is identical, the only difference is the name of the function. In particular the timing test does a lookup on the global name. Try renaming without_else() and the difference disappears:

>>> def no_else(param=False):
    if param:
        return 1
    return 0

>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]

My guess is that without_else has a hash collision with something else in globals() so the global name lookup is slightly slower.

Edit: A dictionary with 7 or 8 keys probably has 32 slots, so on that basis without_else has a hash collision with __builtins__:

>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]

To clarify how the hashing works:

__builtins__ hashes to -1196389688 which reduced modulo the table size (32) means it is stored in the #8 slot of the table.

without_else hashes to 505688136 which reduced modulo 32 is 8 so there's a collision. To resolve this Python calculates:

Starting with:

j = hash % 32
perturb = hash

Repeat this until we find a free slot:

j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;

which gives it 17 to use as the next index. Fortunately that's free so the loop only repeats once. The hash table size is a power of 2, so 2**i is the size of the hash table, i is the number of bits used from the hash value j.

Each probe into the table can find one of these:

The slot is empty, in that case the probing stops and we know the value is not in the table.
The slot is unused but was used in the past in which case we go try the next value calculated as above.
The slot is full but the full hash value stored in the table isn't the same as the hash of the key we are looking for (that's what happens in the case of __builtins__ vs without_else).
The slot is full and has exactly the hash value we want, then Python checks to see if the key and the object we are looking up are the same object (which in this case they will be because short strings that could be identifiers are interned so identical identifiers use the exact same string).
Finally when the slot is full, the hash matches exactly, but the keys are not the identical object, then and only then will Python try comparing them for equality. This is comparatively slow, but in the case of name lookups shouldn't actually happen.

The length of the name alone may be a factor, considering that the hash code function scales linearly. — Chris Eberle, Nov 25 '11 at 18:12
@Chris, no the length of the string shouldn't be significant. The very first time you hash a string it will take time proportional to the string length but then the calculated hash is cached in the string object so subsequent hashes are O(1). — Duncan, Nov 25 '11 at 18:26
Fascinating! May I call you Sherlock? ;) Anyways I hope I won't forget to give you some additional points with a bounty as soon as the question is eligible. — Voo, Nov 26 '11 at 02:50
@Duncan - Accepted, as I can in fact replicate your results, thanks. However, can you just confirm that I understand correctly the following: if the hash of two function names are the same then python uses the name directly to find its code, while if it is unique it doesn't and it is this additional step that slow down one of the two forms? — mac, Nov 28 '11 at 09:46
@mac, not quite. I'll add a bit about hash resolution (was going to squeeze it into the comment but it's more interesting than I thought). — Duncan, Nov 28 '11 at 09:58
@Duncan - Thank you so much for having taken the time to illustrate the hash process. Top notch answer! :) — mac, Nov 28 '11 at 10:24
@Duncan what's the `i` in `use j % 2**i as the next table index;`? — Fabio, Jan 22 '17 at 03:11
@dietbacon, good question. The hash table size is `2**i`, so the hash is using the last `i` bits of `j`. — Duncan, Jan 22 '17 at 19:38
Note: these performance results cannot be consistently reproduced anymore, because since 2012 Python adds a random salt (chosen during startup) to the hashing function, in order to avoid *hash collision denial of service*. See: http://ocert.org/advisories/ocert-2011-003.html and https://docs.python.org/3/reference/datamodel.html#object.__hash__ — Denilson Sá Maia, Oct 25 '19 at 09:55
I'd expect the hash collision to add a constant overhead to the timing results, so why is the difference between `with_else` and `without_else` significantly smaller when `param=True`? I'd expect the overhead and hence the difference to be similar. — a_guest, Mar 11 '20 at 15:37

Why is early return slower than else?

1 Answers1

Linked