Performance difference between add and + operator

Question

I'm reading Learning Python 5th edition and I need some more explanation on this paragraph:

The __add__ method of strings, for example, is what really performs concatenation; Python maps the first of the following to the second internally, though you shouldn't usually use the second form yourself( it's less intuitive, and might even run slower):

>>> S+'NI!'
'spamNI!'
>>> S.__add__('NI!')
'spamNI!'

so my question is, why would it run slower?

Just disassemble the two expressions, You will be able to find the difference. — Abdul Niyas P M, Jul 12 '19 at 09:37
Note that ``+`` and ``__add__`` are *not* equivalent. ``+`` will also invoke ``__radd__`` and may skip ``__add__`` entirely, in addition to re-interpreting the ``NotImplemented`` return value. — MisterMiyagi, Jul 12 '19 at 10:05

freakish · Accepted Answer · 2019-07-12T10:17:28.050

>>> def test(a, b):
...     return a + b
... 
>>> def test2(a, b):
...     return a.__add__(b)
... 
>>> import dis
>>> dis.dis(test)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD          
              7 RETURN_VALUE        
>>> dis.dis(test2)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                0 (__add__)
              6 LOAD_FAST                1 (b)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE

1 BINARY_ADD instruction instead of 2 instructions: LOAD_ATTR and CALL_FUNCTION. And since BINARY_ADD does (almost) the same thing (but in C) then we can expect it to be (slightly) faster. The difference will be hardly noticable though.

Side note: so this is similar to how assembly works. Often when there is a single instruction that does the same thing as a sequence of instructions it will perform better. For example in x64 LEA instruction can be replaced with a sequence of other instructions. But they won't perform as well.

But there's a catch (which explains why I've started talking about x64 assembly). Sometimes a single instruction actually performs worse. See the infamous LOOP instruction. There may be many reasons for such a counterintuitive behaviour, like: a bit different assumption, not optimized implementation, historical reasons, a bug and so on, and so on.

Conclusion: in Python + theoretically should be faster than __add__ but always measure.

_“always measure”_ – Totally agree but when doing so, measure in the context of your application and don’t choose by the results of micro benchmarks that just compare `+` and `__add__`. Chances are that there are a lot “worse” performance issues that the difference between these two ways to add. — poke, Jul 12 '19 at 10:13

poke · Answer 2 · 2019-07-12T10:12:32.843

It was probably explained that the + operator will actually call __add__ under the hood. So when you do S + 'NI!' then what happens under the hood is that __add__ is actually called (if S has one). So semantically, both versions do exactly the same thing.

The difference is in what the code corresponds to though. As you probably know, Python is compiled into bytecode which is then executed. The bytecode operations are what determine what steps the interpreter has to execute. You can take a look at the bytecode with the dis module:

>>> import dis
>>> dis.dis("S+'NI!'")
  1           0 LOAD_NAME                0 (S)
              2 LOAD_CONST               0 ('NI!')
              4 BINARY_ADD
              6 RETURN_VALUE
>>> dis.dis("S.__add__('NI!')")
  1           0 LOAD_NAME                0 (S)
              2 LOAD_METHOD              1 (__add__)
              4 LOAD_CONST               0 ('NI!')
              6 CALL_METHOD              1

As you can see, the difference here is basically that the + operator just does a BINARY_ADD while the __add__ call loads the actual method and executes it.

When the interpreter sees the BINARY_ADD it will automatically look up the __add__ implementation and call that, but it can do so more efficiently than when you have to look up the method within Python bytecode.

So basically, by calling __add__ explicitly, you are preventing the interpreter from going the faster route to the implementation.

That being said, the difference is negligible. If you time the difference between the two calls, you can see the difference but it is really not that much (this is 10M calls):

>>> timeit("S+'NI!'", setup='S = "spam"', number=10**7)
0.45791053899995404
>>> timeit("S.__add__('NI!')", setup='S = "spam"', number=10**7)
1.0082074819999889

Note that these results don’t always have to look like this. When timing a custom type (with a very simple __add__ implementation), the call to __add__ could turn out to be faster:

>>> timeit("S+'NI!'", setup='from __main__ import SType;S = SType()', number=10**7)
0.7971681049998551
>>> timeit("S.__add__('NI!')", setup='from __main__ import SType;S = SType()', number=10**7)
0.6606798959999196

The difference here is even smaller but + is slower.

The bottom line is that you shouldn’t worry about these differences. Choose what is more readable, and almost all of the time that will be +. If you need to worry about performance, then make sure to analyze your application as a whole, and don’t trust such micro-benchmarks. They aren’t helpful when looking at your application, and in 99.99%, the difference between these two ways will not make a difference. It’s much more likely that there is another bottleneck in your application that will slow it down more.

In other words, `+` for builtin types is optimized up the wazoo. — deceze, Jul 12 '19 at 09:47
Your last example (with custom `__add__`) made me very curious so I've started digging into it. In theory custom or not should not matter, right? I've done many tests and sometimes `+` is faster and sometimes `__add__` is. It seems that `__add__` implementation is irrelevant. But it heavily dominates the execution time when it is custom. And since difference is so small (plus minus 100ms after 10M rounds on my machine) then I think it is safe to assume that this is due to external (e.g. non-python) factors. — freakish, Jul 12 '19 at 10:56
@freakish I was actually thinking about this earlier too because I don’t have a perfect explanation for it yet. One reason could be because `+` actually does more than *just* call `__add__`; when we say that both are equivalent, it’s really just an oversimplification (since `+` will also look for other methods it could call). The other reason could be that the context switch from the native execution of the `BINARY_ADD` back to executing Python code is having some small performance impact there. — poke, Jul 12 '19 at 15:54

Performance difference between __add__ and + operator

2 Answers2

Performance difference between add and + operator