In this post, Guido van Rossum says that a function call may be expensive, but I do not understand why nor how much expensive can be.
How much delay adds to your code a simple function call and why?
In this post, Guido van Rossum says that a function call may be expensive, but I do not understand why nor how much expensive can be.
How much delay adds to your code a simple function call and why?
A function call requires that the current execution frame is suspended, and a new frame is created and pushed on the stack. This is relatively expensive, compared to many other operations.
You can measure the exact time required with the timeit
module:
>>> import timeit
>>> def f(): pass
...
>>> timeit.timeit(f)
0.15175890922546387
That's 1/6th of a second for a million calls to an empty function; you'd compare the time required with whatever you are thinking of putting in a function; the 0.15 second would need to taken into account, if performance is an issue.
Python has a "relatively high" function call overhead, it is the cost we pay for some of Python's most useful functionality.
Monkey Patching:
You have so much power to monkey-patch/override behavior in Python that the interpreter can't guarantee that given
a, b = X(1), X(2)
return a.fn() + b.fn() + a.fn()
a.fn() and b.fn() are the same, or that a.fn() will be the same after b.fn() has been called.
In [1]: def f(a, b):
...: return a.fn() + b.fn() + c.fn()
...:
In [2]: dis.dis(f)
1 0 LOAD_FAST 0 (a)
3 LOAD_ATTR 0 (fn)
6 CALL_FUNCTION 0
9 LOAD_FAST 1 (b)
12 LOAD_ATTR 0 (fn)
15 CALL_FUNCTION 0
18 BINARY_ADD
19 LOAD_GLOBAL 1 (c)
22 LOAD_ATTR 0 (fn)
25 CALL_FUNCTION 0
28 BINARY_ADD
29 RETURN_VALUE
In the above, you can see that 'fn' is looked up at each location. The same applies to variables, but people seem to be more aware of that.
In [11]: def g(a):
...: return a.i + a.i + a.i
...:
In [12]: dis.dis(g)
2 0 LOAD_FAST 0 (a)
3 LOAD_ATTR 0 (i)
6 LOAD_FAST 0 (a)
9 LOAD_ATTR 0 (i)
12 BINARY_ADD
13 LOAD_FAST 0 (a)
16 LOAD_ATTR 0 (i)
19 BINARY_ADD
20 RETURN_VALUE
Worse, because modules can monkey patch/replace themselves/each other, if you were calling a global/module function, the global/module has to be looked up every time:
In [16]: def h():
...: v = numpy.vector(numpy.vector.identity)
...: for i in range(100):
...: v = numpy.vector.add(v, numpy.vector.identity)
...:
In [17]: dis.dis(h)
2 0 LOAD_GLOBAL 0 (numpy)
3 LOAD_ATTR 1 (vector)
6 LOAD_GLOBAL 0 (numpy)
9 LOAD_ATTR 1 (vector)
12 LOAD_ATTR 2 (identity)
15 CALL_FUNCTION 1
18 STORE_FAST 0 (v)
3 21 SETUP_LOOP 47 (to 71)
24 LOAD_GLOBAL 3 (range)
27 LOAD_CONST 1 (100)
30 CALL_FUNCTION 1
33 GET_ITER
>> 34 FOR_ITER 33 (to 70)
37 STORE_FAST 1 (i)
4 40 LOAD_GLOBAL 0 (numpy)
43 LOAD_ATTR 1 (vector)
46 LOAD_ATTR 4 (add)
49 LOAD_FAST 0 (v)
52 LOAD_GLOBAL 0 (numpy)
55 LOAD_ATTR 1 (vector)
58 LOAD_ATTR 2 (identity)
61 CALL_FUNCTION 2
64 STORE_FAST 0 (v)
67 JUMP_ABSOLUTE 34
>> 70 POP_BLOCK
>> 71 LOAD_CONST 0 (None)
74 RETURN_VALUE
WORKAROUND
Consider capturing or importing any values you expect not to mutate:
def f1(files):
for filename in files:
if os.path.exists(filename):
yield filename
# vs
def f2(files):
from os.path import exists
for filename in files:
if exists(filename):
yield filename
# or
def f3(files, exists=os.path.exists):
for filename in files:
if exists(filename):
yield filename
See also the "In the wild" section
It's not always possible to import, though; for example, you you can import sys.stdin but you can't import sys.stdin.readline, and numpy types can have similar problems:
In [15]: def h():
...: from numpy import vector
...: add = vector.add
...: idy = vector.identity
...: v = vector(idy)
...: for i in range(100):
...: v = add(v, idy)
...:
In [16]: dis.dis(h)
2 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 2 (('vector',))
6 IMPORT_NAME 0 (numpy)
9 IMPORT_FROM 1 (vector)
12 STORE_FAST 0 (vector)
15 POP_TOP
3 16 LOAD_FAST 0 (vector)
19 LOAD_ATTR 2 (add)
22 STORE_FAST 1 (add)
4 25 LOAD_FAST 0 (vector)
28 LOAD_ATTR 3 (identity)
31 STORE_FAST 2 (idy)
5 34 LOAD_FAST 0 (vector)
37 LOAD_FAST 2 (idy)
40 CALL_FUNCTION 1
43 STORE_FAST 3 (v)
6 46 SETUP_LOOP 35 (to 84)
49 LOAD_GLOBAL 4 (range)
52 LOAD_CONST 3 (100)
55 CALL_FUNCTION 1
58 GET_ITER
>> 59 FOR_ITER 21 (to 83)
62 STORE_FAST 4 (i)
7 65 LOAD_FAST 1 (add)
68 LOAD_FAST 3 (v)
71 LOAD_FAST 2 (idy)
74 CALL_FUNCTION 2
77 STORE_FAST 3 (v)
80 JUMP_ABSOLUTE 59
>> 83 POP_BLOCK
>> 84 LOAD_CONST 0 (None)
87 RETURN_VALUE
CAVEAT EMPTOR:
Argument Passing
Python's argument passing mechanism looks trivial, but unlike most language it costs a lot. We're talking about separating arguments into args and kwargs:
f(1, 2, 3)
f(1, 2, c=3)
f(c=3)
f(1, 2) # c is auto-injected
There's a lot of work that goes on in the CALL_FUNCTION operation, including potentially one transition from the C layer to the Python layer and back.
In addition to this, the parameters often need to be looked up to be passed:
f(obj.x, obj.y, obj.z)
Consider:
In [28]: def fn(obj):
...: f = some.module.function
...: for x in range(1000):
...: for y in range(1000):
...: f(x + obj.x, y + obj.y, obj.z)
...:
In [29]: dis.dis(fn)
2 0 LOAD_GLOBAL 0 (some)
3 LOAD_ATTR 1 (module)
6 LOAD_ATTR 2 (function)
9 STORE_FAST 1 (f)
3 12 SETUP_LOOP 76 (to 91)
15 LOAD_GLOBAL 3 (range)
18 LOAD_CONST 1 (1000)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 62 (to 90)
28 STORE_FAST 2 (x)
4 31 SETUP_LOOP 53 (to 87)
34 LOAD_GLOBAL 3 (range)
37 LOAD_CONST 1 (1000)
40 CALL_FUNCTION 1
43 GET_ITER
>> 44 FOR_ITER 39 (to 86)
47 STORE_FAST 3 (y)
5 50 LOAD_FAST 1 (f)
53 LOAD_FAST 2 (x)
56 LOAD_FAST 0 (obj)
59 LOAD_ATTR 4 (x)
62 BINARY_ADD
63 LOAD_FAST 3 (y)
66 LOAD_FAST 0 (obj)
69 LOAD_ATTR 5 (y)
72 BINARY_ADD
73 LOAD_FAST 0 (obj)
76 LOAD_ATTR 6 (z)
79 CALL_FUNCTION 3
82 POP_TOP
83 JUMP_ABSOLUTE 44
>> 86 POP_BLOCK
>> 87 JUMP_ABSOLUTE 25
>> 90 POP_BLOCK
>> 91 LOAD_CONST 0 (None)
94 RETURN_VALUE
Where "LOAD_GLOBAL" requires that the name be hashed and then the globals table be queried for that hash value. This is an O(log N) operation.
But think about this: for our two, simple, 0-1000 loops, we are doing that a million times...
LOAD_FAST and LOAD_ATTR are also hash table lookups, they're just restricted to specific hash tables. LOAD_FAST consults the locals() hash table, LOAD_ATTR consults the hash table of the object loaded last...
But also note that we are calling a function there a million times. Fortunately, it's a built-in function and builtins have a much less prohibitive overhead; but if this was really your perf hotspot, you might want to consider optimizing the overhead of range by doing something like:
x, y = 0, 0
for i in range(1000 * 1000):
....
y += 1
if y > 1000:
x, y = x + 1, 0
You could do some hacking on capturing variables, but it's likely to have minimal perf impact on this code and simply make it less maintainable.
But the core pythonic fix to this problem is to use generators or iterables:
for i in obj.values():
prepare(i)
# vs
prepare(obj.values())
and
for i in ("left", "right", "up", "down"):
test_move(i)
# vs
test_move(("left", "right", "up", "down"))
and
for x in range(-1000, 1000):
for y in range(-1000, 1000):
fn(x + obj.x, y + obj.y, obj.z)
# vs
def coordinates(obj):
for x in range(obj.x - 1000, obj.x + 1000 + 1):
for y in range(obj.y - 1000, obj.y + 1000 + 1):
yield x, y, z
fn(coordinates(obj))
In the wild
You'll see these pythopticisms in the wild in forms like this:
def some_fn(a, b, c, stdin=sys.stdin):
...
This has several advantages:
Most numpy calls either take or have a variant that takes a list, array, etc, and if you're not using those, you're probably missing out on 99% of numpy's benefits.
def distances(target, candidates):
values = []
for candidate in candidates:
values.append(numpy.linalg.norm(candidate - target))
return numpy.array(values)
# vs
def distances(target, candidates):
return numpy.linalg.norm(candidates - target)
(Note: this isn't necessarily the best way to get distances, esp if you are not going to forward the distance value elsewhere; e.g if you're doing range check, it is probably more efficient to use a more selective approach that avoids using the sqrt operation)
Optimizing for iterables doesn't just mean passing them, but also returning them
def f4(files, exists=os.path.exists):
return (filename for filename in files if exists(filename))
^- returns a generator expression
Any statement of the form "X is expensive" doesn't take into account that performance is always relative to whatever else is going on, and relative to however else the task can be done.
There are many questions on SO that express concern about things that might be, but typically are not, performance problems.
As to whether function calls are expensive, there's a general two-part answer.
For functions that do very little and do not call further sub-functions, and that in a particular application are responsible for more than 10% of total wall-clock time, it is worthwhile trying to in-line them or otherwise reduce the cost of invocation.
In applications containing complex data structures and/or tall abstraction hierarchies, function calls are expensive not because of the time they take, but because they tempt you to make more of them than strictly necessary. When this occurs over multiple levels of abstraction, the inefficiencies multiply together, producing a compounded slowdown that is not easily localized.
The way to produce efficient code is a posteriori, not a priori. First write the code so it is clean and maintainable, including function calls as you like. Then while it is running with a realistic workload, let it tell you what can be done to speed it up. Here's an example.