73

I just realized that doing

x.real*x.real+x.imag*x.imag

is three times faster than doing

abs(x)**2

where x is a numpy array of complex numbers. For code readability, I could define a function like

def abs2(x):
    return x.real*x.real+x.imag*x.imag

which is still far faster than abs(x)**2, but it is at the cost of a function call. Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

Charles Brunet
  • 21,797
  • 24
  • 83
  • 124
  • 14
    If you need this kind of optimisations, you probably need to use something like Cython. – Sven Marnach Jun 22 '11 at 15:05
  • 7
    PyPy to the rescue! – Chris Morgan Jun 22 '11 at 15:07
  • 12
    If you care about such small optimisations, you should be using C, not python. python is not about speed, really. – pajton Jun 22 '11 at 15:07
  • 2
    Have you tried timing the statement vs. function call to see if there's really a difference? – Mark Ransom Jun 22 '11 at 15:18
  • 3
    In addition the the very correct and important (seriously, listen to them), note that due to the dynamic nature of Python, the only time inlining could possible happen is at runtime. This is one of the many optimizations PyPy does (although it doesn't have a remotely complete NumPy yet; but at least it's being worked on), and PyPy works best on idiomatic Python code, not on code written to shave off tiny bits of time off execution overhead. –  Jun 22 '11 at 15:26
  • @vartec: I measured the same for small arrays (100 elements). For large arrays (10000 elments), however, he is probably right. – Ferdinand Beyer Jun 22 '11 at 15:45
  • 1
    Obviously extracting square root is much slower than doing two multiplications and one addition. Why not just `x*x.conj()`, by the way? – Alexey May 20 '18 at 09:17
  • 2
    @MarkRansom Python function calls are notoriously expensive, it's the price paid for the monkey-patching capabilities of Python, etc. https://stackoverflow.com/a/54524575/257645 – kfsone Feb 04 '19 at 21:30

7 Answers7

43

Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

No. Before reaching this specific instruction, Python interpreters don't even know if there's such a function, much less what it does.

As noted in comments, PyPy will inline automatically (the above still holds - it "simply" generates an optimized version at runtime, benefits from it, but breaks out of it when it's invalidated), although in this specific case that doesn't help as implementing NumPy on PyPy started only shortly ago and isn't even beta level to this day. But the bottom line is: Don't worry about optimizations on this level in Python. Either the implementations optimize it themselves or they don't, it's not your responsibility.

  • 10
    +1 "Don't worry about optimizations on this level in Python. Either the implementations optimize it themselves or they don't, it's not your responisbility." – phant0m Jun 22 '11 at 15:43
  • 47
    @phant0m Not sure why you guys like that quote so much... It basically says you cannot optimize without making code ugly. I just had to inline several calls to make my program twice as fast. At least it was worth it... – Tomáš Zato Nov 21 '16 at 01:47
  • 30
    I also find it a bit hard to accept that last comment. It's nice and all that it's "not my responsibility", but at the end of the day, I can't tell my boss that it's somebody else's fault if my code misses performance targets. – zneak Nov 08 '17 at 23:55
33

Not exactly what the OP has asked for, but close:

Inliner inlines Python function calls. Proof of concept for this blog post

from inliner import inline

@inline
def add_stuff(x, y):
    return x + y

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(add_stuff(i, i+1))

In the above code the add_lots_of_numbers function is converted into this:

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(i + i + 1)

Also anyone interested in this question and the complications involved in implementing such optimizer in CPython, might also want to have a look at:

AXO
  • 8,198
  • 6
  • 62
  • 63
  • 2
    Sorry what is the difference between your solution and the question? – RogerS Oct 21 '20 at 05:17
  • 4
    @RogerS, the OP had asked about something similar to C macros (inline keyword) which are very flexible and efficient. This library has some [limitations](https://tomforb.es/automatically-inline-python-function-calls/#limitations) and has a startup time cost, but other than those, it does what the question asks. – AXO Oct 22 '20 at 16:52
10

I'll agree with everyone else that such optimizations will just cause you pain on CPython, that if you care about performance you should consider PyPy (though our NumPy may be too incomplete to be useful). However I'll disagree and say you can care about such optimizations on PyPy, not this one specifically as has been said PyPy does that automatically, but if you know PyPy well you really can tune your code to make PyPy emit the assembly you want, not that you need to almost ever.

Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113
8

No.

The closest you can get to C macros is a script (awk or other) that you may include in a makefile, and which substitutes a certain pattern like abs(x)**2 in your python scripts with the long form.

Thaddee Tyl
  • 1,126
  • 1
  • 12
  • 17
  • 21
    ... which is a horrible idea, a lot of extra work and a decent chance of obscure breakage for nearly zero practical gain. –  Jun 22 '11 at 15:19
  • 1
    Python is not the fastest language there is anyway, which is ok because of its fast development cycles. Adding a "preprocessing" step for a new python project is indeed strongly discouraged. – Thaddee Tyl Jun 22 '11 at 15:26
  • 19
    He did not claim that this was a good idea. Technically, he is correct. – Ferdinand Beyer Jun 22 '11 at 15:30
7

Actually it might be even faster to calculate, like:

x.real** 2+ x.imag** 2

Thus, the extra cost of function call will likely to diminish. Lets see:

In []: n= 1e4
In []: x= randn(n, 1)+ 1j* rand(n, 1)
In []: %timeit x.real* x.real+ x.imag* x.imag
10000 loops, best of 3: 100 us per loop
In []: %timeit x.real** 2+ x.imag** 2
10000 loops, best of 3: 77.9 us per loop

And encapsulating the calculation in a function:

In []: def abs2(x):
   ..:     return x.real** 2+ x.imag** 2
   ..: 
In []: %timeit abs2(x)
10000 loops, best of 3: 80.1 us per loop

Anyway (as other have pointed out) this kind of micro-optimization (in order to avoid a function call) is not really productive way to write python code.

eat
  • 7,440
  • 1
  • 19
  • 27
1

You can try to use lambda:

abs2 = lambda x : x.real*x.real+x.imag*x.imag

then call it by:

y = abs2(x)
Linfeng Mu
  • 171
  • 1
  • 11
  • 8
    Good thought but I just tried it... That **didn't improve performance at all:** `def foo(bar): return bar` vs `foo = lambda bar: bar` both execute in `57.5 nanoseconds` on my system. Measured with timeit. So lambdas are exactly like regular functions and their calls. At least on CPython 3.8. – Mitch McMabers Oct 07 '19 at 16:50
0

Python is a dynamic programming language. Luckily Python does compile to bytecode before execution. So you can inline code. For simple solutions that don't require fat external packages you can use Pythons in house functions:

from inspect import getsource

abs2 = lambda z : z.real * z.real + z.imag * z.imag

def loop (zz, zs):
  for z in zs:
    zz += abs2 (z)

print ( f"loop code:\n{getsource (loop)}" )

inlined = getsource (loop).replace ("abs2 (z)", getsource (abs2).split(":")[1] )

print ( f"inlined loop code:\n{inlined}" )

compiled = compile (inlined, '<string>', 'exec').co_code

def loop2 (zz, zs):
  for z in zs:
    zz += z.real * z.real + z.imag * z.imag

compiled2 = compile (getsource (loop2), '<string>', 'exec').co_code

print ( f"compiled loop  code: {compiled}" )
print ( f"compiled loop2 code: {compiled2}")

Note: this only supports one line lambdas with the parameters having the same name than the passed variables. A simple and very hackish solution, but Python isn't an interpreter language to not support real time code editing.

areop-enap
  • 396
  • 1
  • 7