30

In this post, Guido van Rossum says that a function call may be expensive, but I do not understand why nor how much expensive can be.

How much delay adds to your code a simple function call and why?

Eugene Morozov
  • 15,081
  • 3
  • 25
  • 32
VGonPa
  • 443
  • 1
  • 7
  • 12

3 Answers3

33

A function call requires that the current execution frame is suspended, and a new frame is created and pushed on the stack. This is relatively expensive, compared to many other operations.

You can measure the exact time required with the timeit module:

>>> import timeit
>>> def f(): pass
... 
>>> timeit.timeit(f)
0.15175890922546387

That's 1/6th of a second for a million calls to an empty function; you'd compare the time required with whatever you are thinking of putting in a function; the 0.15 second would need to taken into account, if performance is an issue.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 3
    When you saying _relative expensive compared to many other operations_, which kind of operations are you thinking of? E.g. Creating a new variable in memory, assignments, statements, ... – VGonPa Apr 06 '14 at 11:19
  • That's way too broad; do your own timings for your specific problem instead. – Martijn Pieters Apr 06 '14 at 11:27
  • 3
    It is not that I have an specific problem, it is that I just want to understand why and in which cases is preferable to avoid function calling. Some timings: `timeit.timeit('"abc"+"def"') # 0.0361530780792` `timeit.timeit('a=[]') # 0.0726218223572` – VGonPa Apr 06 '14 at 11:49
  • 2
    `"abc" + "def"` is a constant lookup; the compiler will already have folded the concatenation. – Martijn Pieters Apr 06 '14 at 12:03
23

Python has a "relatively high" function call overhead, it is the cost we pay for some of Python's most useful functionality.

Monkey Patching:

You have so much power to monkey-patch/override behavior in Python that the interpreter can't guarantee that given

 a, b = X(1), X(2)
 return a.fn() + b.fn() + a.fn()

a.fn() and b.fn() are the same, or that a.fn() will be the same after b.fn() has been called.

In [1]: def f(a, b):
   ...:     return a.fn() + b.fn() + c.fn()
   ...:

In [2]: dis.dis(f)
  1           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                0 (fn)
              6 CALL_FUNCTION            0
              9 LOAD_FAST                1 (b)
             12 LOAD_ATTR                0 (fn)
             15 CALL_FUNCTION            0
             18 BINARY_ADD
             19 LOAD_GLOBAL              1 (c)
             22 LOAD_ATTR                0 (fn)
             25 CALL_FUNCTION            0
             28 BINARY_ADD
             29 RETURN_VALUE

In the above, you can see that 'fn' is looked up at each location. The same applies to variables, but people seem to be more aware of that.

In [11]: def g(a):
    ...:     return a.i + a.i + a.i
    ...:

In [12]: dis.dis(g)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                0 (i)
              6 LOAD_FAST                0 (a)
              9 LOAD_ATTR                0 (i)
             12 BINARY_ADD
             13 LOAD_FAST                0 (a)
             16 LOAD_ATTR                0 (i)
             19 BINARY_ADD
             20 RETURN_VALUE

Worse, because modules can monkey patch/replace themselves/each other, if you were calling a global/module function, the global/module has to be looked up every time:

In [16]: def h():
    ...:     v = numpy.vector(numpy.vector.identity)
    ...:     for i in range(100):
    ...:         v = numpy.vector.add(v, numpy.vector.identity)
    ...:

In [17]: dis.dis(h)
  2           0 LOAD_GLOBAL              0 (numpy)
              3 LOAD_ATTR                1 (vector)
              6 LOAD_GLOBAL              0 (numpy)
              9 LOAD_ATTR                1 (vector)
             12 LOAD_ATTR                2 (identity)
             15 CALL_FUNCTION            1
             18 STORE_FAST               0 (v)

  3          21 SETUP_LOOP              47 (to 71)
             24 LOAD_GLOBAL              3 (range)
             27 LOAD_CONST               1 (100)
             30 CALL_FUNCTION            1
             33 GET_ITER
        >>   34 FOR_ITER                33 (to 70)
             37 STORE_FAST               1 (i)

  4          40 LOAD_GLOBAL              0 (numpy)
             43 LOAD_ATTR                1 (vector)
             46 LOAD_ATTR                4 (add)
             49 LOAD_FAST                0 (v)
             52 LOAD_GLOBAL              0 (numpy)
             55 LOAD_ATTR                1 (vector)
             58 LOAD_ATTR                2 (identity)
             61 CALL_FUNCTION            2
             64 STORE_FAST               0 (v)
             67 JUMP_ABSOLUTE           34
        >>   70 POP_BLOCK
        >>   71 LOAD_CONST               0 (None)
             74 RETURN_VALUE

WORKAROUND

Consider capturing or importing any values you expect not to mutate:

def f1(files):
    for filename in files:
        if os.path.exists(filename):
            yield filename

# vs

def f2(files):
    from os.path import exists
    for filename in files:
        if exists(filename):
            yield filename

# or

def f3(files, exists=os.path.exists):
    for filename in files:
        if exists(filename):
            yield filename

See also the "In the wild" section

It's not always possible to import, though; for example, you you can import sys.stdin but you can't import sys.stdin.readline, and numpy types can have similar problems:

In [15]: def h():
    ...:     from numpy import vector
    ...:     add = vector.add
    ...:     idy = vector.identity
    ...:     v   = vector(idy)
    ...:     for i in range(100):
    ...:         v = add(v, idy)
    ...:

In [16]: dis.dis(h)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               2 (('vector',))
              6 IMPORT_NAME              0 (numpy)
              9 IMPORT_FROM              1 (vector)
             12 STORE_FAST               0 (vector)
             15 POP_TOP

  3          16 LOAD_FAST                0 (vector)
             19 LOAD_ATTR                2 (add)
             22 STORE_FAST               1 (add)

  4          25 LOAD_FAST                0 (vector)
             28 LOAD_ATTR                3 (identity)
             31 STORE_FAST               2 (idy)

  5          34 LOAD_FAST                0 (vector)
             37 LOAD_FAST                2 (idy)
             40 CALL_FUNCTION            1
             43 STORE_FAST               3 (v)

  6          46 SETUP_LOOP              35 (to 84)
             49 LOAD_GLOBAL              4 (range)
             52 LOAD_CONST               3 (100)
             55 CALL_FUNCTION            1
             58 GET_ITER
        >>   59 FOR_ITER                21 (to 83)
             62 STORE_FAST               4 (i)

  7          65 LOAD_FAST                1 (add)
             68 LOAD_FAST                3 (v)
             71 LOAD_FAST                2 (idy)
             74 CALL_FUNCTION            2
             77 STORE_FAST               3 (v)
             80 JUMP_ABSOLUTE           59
        >>   83 POP_BLOCK
        >>   84 LOAD_CONST               0 (None)
             87 RETURN_VALUE

CAVEAT EMPTOR:

  • The capture variables is not a zero-cost operation, and it increases the frame size,
  • Only use after identifying hot code paths,

Argument Passing

Python's argument passing mechanism looks trivial, but unlike most language it costs a lot. We're talking about separating arguments into args and kwargs:

f(1, 2, 3)
f(1, 2, c=3)
f(c=3)
f(1, 2)  # c is auto-injected

There's a lot of work that goes on in the CALL_FUNCTION operation, including potentially one transition from the C layer to the Python layer and back.

In addition to this, the parameters often need to be looked up to be passed:

f(obj.x, obj.y, obj.z)

Consider:

In [28]: def fn(obj):
    ...:     f = some.module.function
    ...:     for x in range(1000):
    ...:         for y in range(1000):
    ...:             f(x + obj.x, y + obj.y, obj.z)
    ...:

In [29]: dis.dis(fn)
  2           0 LOAD_GLOBAL              0 (some)
              3 LOAD_ATTR                1 (module)
              6 LOAD_ATTR                2 (function)
              9 STORE_FAST               1 (f)

  3          12 SETUP_LOOP              76 (to 91)
             15 LOAD_GLOBAL              3 (range)
             18 LOAD_CONST               1 (1000)
             21 CALL_FUNCTION            1
             24 GET_ITER
        >>   25 FOR_ITER                62 (to 90)
             28 STORE_FAST               2 (x)

  4          31 SETUP_LOOP              53 (to 87)
             34 LOAD_GLOBAL              3 (range)
             37 LOAD_CONST               1 (1000)
             40 CALL_FUNCTION            1
             43 GET_ITER
        >>   44 FOR_ITER                39 (to 86)
             47 STORE_FAST               3 (y)

  5          50 LOAD_FAST                1 (f)
             53 LOAD_FAST                2 (x)
             56 LOAD_FAST                0 (obj)
             59 LOAD_ATTR                4 (x)
             62 BINARY_ADD
             63 LOAD_FAST                3 (y)
             66 LOAD_FAST                0 (obj)
             69 LOAD_ATTR                5 (y)
             72 BINARY_ADD
             73 LOAD_FAST                0 (obj)
             76 LOAD_ATTR                6 (z)
             79 CALL_FUNCTION            3
             82 POP_TOP
             83 JUMP_ABSOLUTE           44
        >>   86 POP_BLOCK
        >>   87 JUMP_ABSOLUTE           25
        >>   90 POP_BLOCK
        >>   91 LOAD_CONST               0 (None)
             94 RETURN_VALUE

Where "LOAD_GLOBAL" requires that the name be hashed and then the globals table be queried for that hash value. This is an O(log N) operation.

But think about this: for our two, simple, 0-1000 loops, we are doing that a million times...

LOAD_FAST and LOAD_ATTR are also hash table lookups, they're just restricted to specific hash tables. LOAD_FAST consults the locals() hash table, LOAD_ATTR consults the hash table of the object loaded last...

But also note that we are calling a function there a million times. Fortunately, it's a built-in function and builtins have a much less prohibitive overhead; but if this was really your perf hotspot, you might want to consider optimizing the overhead of range by doing something like:

x, y = 0, 0
for i in range(1000 * 1000):
    ....
    y += 1
    if y > 1000:
        x, y = x + 1, 0

You could do some hacking on capturing variables, but it's likely to have minimal perf impact on this code and simply make it less maintainable.

But the core pythonic fix to this problem is to use generators or iterables:

for i in obj.values():
    prepare(i)

# vs

prepare(obj.values())

and

for i in ("left", "right", "up", "down"):
    test_move(i)

# vs

test_move(("left", "right", "up", "down"))

and

for x in range(-1000, 1000):
    for y in range(-1000, 1000):
        fn(x + obj.x, y + obj.y, obj.z)

# vs

def coordinates(obj):
    for x in range(obj.x - 1000, obj.x + 1000 + 1):
        for y in range(obj.y - 1000, obj.y + 1000 + 1):
          yield x, y, z

fn(coordinates(obj))

In the wild

You'll see these pythopticisms in the wild in forms like this:

def some_fn(a, b, c, stdin=sys.stdin):
    ...

This has several advantages:

  • impacts the help() for this function, (default input is stdin)
  • provides a hook for unit tests,
  • promotes sys.stdin to a local (LOAD_FAST vs LOAD_GLOBAL+LOAD_ATTR)

Most numpy calls either take or have a variant that takes a list, array, etc, and if you're not using those, you're probably missing out on 99% of numpy's benefits.

def distances(target, candidates):
    values = []
    for candidate in candidates:
        values.append(numpy.linalg.norm(candidate - target))
    return numpy.array(values)

# vs

def distances(target, candidates):
    return numpy.linalg.norm(candidates - target)

(Note: this isn't necessarily the best way to get distances, esp if you are not going to forward the distance value elsewhere; e.g if you're doing range check, it is probably more efficient to use a more selective approach that avoids using the sqrt operation)

Optimizing for iterables doesn't just mean passing them, but also returning them

def f4(files, exists=os.path.exists):
    return (filename for filename in files if exists(filename))
           ^- returns a generator expression
kfsone
  • 23,617
  • 2
  • 42
  • 74
13

Any statement of the form "X is expensive" doesn't take into account that performance is always relative to whatever else is going on, and relative to however else the task can be done.

There are many questions on SO that express concern about things that might be, but typically are not, performance problems.

As to whether function calls are expensive, there's a general two-part answer.

  1. For functions that do very little and do not call further sub-functions, and that in a particular application are responsible for more than 10% of total wall-clock time, it is worthwhile trying to in-line them or otherwise reduce the cost of invocation.

  2. In applications containing complex data structures and/or tall abstraction hierarchies, function calls are expensive not because of the time they take, but because they tempt you to make more of them than strictly necessary. When this occurs over multiple levels of abstraction, the inefficiencies multiply together, producing a compounded slowdown that is not easily localized.

The way to produce efficient code is a posteriori, not a priori. First write the code so it is clean and maintainable, including function calls as you like. Then while it is running with a realistic workload, let it tell you what can be done to speed it up. Here's an example.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • 4
    Python's "CALL_FUNCTION" operation is known to be over an order of magnitude more expensive than the equivalent "CALL" operation equivalent in most other languages. This is part because you must look up the function at every call, even in a loop; part because of Python's argument passing system; part because of decorators and part because of Python's monkey patching capabilities. That's why Python itself recommends you favor functions that take iterables over repeatedly calling a function. – kfsone Feb 04 '19 at 20:29
  • @kfsone: Thanks. I didn't know that. Of course, if you're only doing it once or a small number of time, it won't matter. But if it's in your inner loop, it will make a huge difference. (Of course, my favorite technique, (modest cough) random pausing, will spot it.) – Mike Dunlavey Feb 04 '19 at 22:52
  • 3
    I was recently working on a project that used "esrapy" to parse files. It took 5 minutes to parse 1200 small files. I changed one function and one call site to `x = call(iterable)` instead of `x = [call(i) for i in iterable]` -> 2 minutes. A few name hoists (`spacesRe = re.compile('\s+')` -> `spaces = re.compile('\s+').search`, `obj.info.array.add` in a loop -> `add = obj.info.array.add; for ...: add(...)`) and it was down to 50s. – kfsone Feb 05 '19 at 21:17
  • 2
    Downvoted because this answer mostly talks about when and how to optimize, while the question asked about why function calls are expensive in python (considering that in many languages function calls can be 0-cost, the question is understandable). – pro-gramer Nov 08 '20 at 10:55