4

By running the below code I get for direct comparison with an if statement almost 4x the speed vs using the max function.

I am trying to understand the reason behind this.

comparison : 0.63s, max : 2.3s

import time

if _name_ == '_main_':
    sim = 10**7

    s = time.time()
    for _ in range(sim):
        if 1 > 2:
            pass
    res1 = time.time()-s

    s = time.time()
    for _ in range(sim):
        max(1, 2)
    res2 = time.time()-s

    print('comparison : {:.2}s, max : {:.2}s'.format(res1, res2))
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
TylerDurden
  • 135
  • 1
  • 9

1 Answers1

10

because max involves a dictionary lookup for the function name, then a function call, whereas direct < operator does not.

max is beginning to become interesting speed-wise when you have more elements.

Related / same speed difference:

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 1
    Two dictionary lookups actually; one in module globals, and when that fails, a second one in builtin scope. If you're calling `max` a lot in a given function, caching it (e.g. via a keyword only argument, a la `def myfunc(arg1, arg2, *, max=max):`) is a cheap hack to remove the lookup costs (replacing the two `dict` lookups of `LOAD_GLOBAL` with a single C array indexing operation from `LOAD_FAST`). The function call overhead is the larger cost though (generic function dispatch has a lot more overhead than anything that goes straight for specialized slotted functions, e.g. `<`). – ShadowRanger May 23 '19 at 19:27
  • yes, dictionary lookup is `O(1)`. Thanks for the precisions though, interesting. – Jean-François Fabre May 23 '19 at 19:29
  • Well, it's all basically `O(1)` (aside from "how many items were passed to `max`). `dict` lookup is `O(1)` average case, but with a higher constant overhead than arrays and frequently one or two bucket collisions before it finds the right item. On my machine, a function that loads, but doesn't call, `max` 1000 times takes about 45 μs to run without caching in the arguments, about 25.5 μs with the `max=max` caching, and 21.5 μs if the 1000 long loop body is `pass` instead of `max`. So loading from the local 1000x costs ~4 μs, loading from builtin scope cost ~23.5 μs, nearly 6x higher overhead. – ShadowRanger May 23 '19 at 19:38
  • Though like I said, that overhead pales in comparison to actually calling it (having it call `max(1, 2)` increases time to 206 μs uncached, and 173 μs, so the benefit of caching is dwarfed by the overhead of calling the function). – ShadowRanger May 23 '19 at 19:39
  • now my answer feels undocumented :) Also `O(1)` doesn't mean that it isn't costly for super-long function names because hashing takes longer to compute (not the case for `max`, though). If you want speed, use a compiled language, or a framework like numpy or pandas where you pass one request for a lot of data, not small requests. – Jean-François Fabre May 23 '19 at 19:40
  • that also reminds me of `a <= b < c` versus `b in range(a,c)` – Jean-François Fabre May 23 '19 at 19:43
  • Well, ultra-detailed now, but CPython (and I'd assume most alternate interpreters, because it makes sense) interns all names during compilation, which also precomputes & caches the hash codes, so the name length doesn't matter for hashing. And being interned in all the `dict`s, the comparison operation ends up being identity based (assuming you haven't manually inserted an uninterned `str` into the `dict`s that implement globals and builtins), so the contents of the string are never actually read for comparison. And yeah, range testing is definitely similar to this case. – ShadowRanger May 23 '19 at 19:44
  • all this information in comments seems wasteful to me (because comments, are, you know, comments and can be easily discarded). Would be better in a real answer (even if OP didn't ask that much detail). Thanks, very instructive anyway – Jean-François Fabre May 23 '19 at 19:47