Does Python optimize away a variable that's only used as a return value?

Question

Is there any ultimate difference between the following two code snippets? The first assigns a value to a variable in a function and then returns that variable. The second function just returns the value directly.

Does Python turn them into equivalent bytecode? Is one of them faster?

Case 1:

def func():
    a = 42
    return a

Case 2:

def func():
    return 42

If you use `dis.dis(..)` on both you see that **there is a difference**, so yes. But in most *real world* applications the overhead of this compared to the delay of the processing in the function is not that much. — Willem Van Onsem, Apr 13 '17 at 11:22
There are two possibilities: (a) You are going to call this function many (i.e. at least a million) times in a tight loop. In that case you shouldn't be calling a Python function at all, but instead should vectorize your loop using something like the numpy library. (b) You are not going to call this function that many times. In that case the difference in speed between these function is too little to be worth worrying about. — Arthur Tacca, Apr 13 '17 at 11:57
[This question is being discussed on meta.](https://meta.stackoverflow.com/questions/347869/do-we-encourage-nonsense-questions) — T.J. Crowder, Apr 13 '17 at 13:38

Dimitris Fasarakis Hilliard · Accepted Answer · 2017-05-22T05:01:40.313

No, it doesn't.

The compilation to CPython byte code is only passed through a small peephole optimizer that is designed to do only basic optimizations (See test_peepholer.py in the test suite for more on these optimizations).

To take a look at what's actually going to happen, use dis* to see the instructions generated. For the first function, containing the assignment:

from dis import dis
dis(func)
  2           0 LOAD_CONST               1 (42)
              2 STORE_FAST               0 (a)

  3           4 LOAD_FAST                0 (a)
              6 RETURN_VALUE

While, for the second function:

dis(func2)
  2           0 LOAD_CONST               1 (42)
              2 RETURN_VALUE

Two more (fast) instructions are used in the first: STORE_FAST and LOAD_FAST. These make a quick store and grab of the value in the fastlocals array of the current execution frame. Then, in both cases, a RETURN_VALUE is performed. So, the second is ever so slightly faster due to less commands needed to execute.

In general, be aware that the CPython compiler is conservative in the optimizations it performs. It isn't and doesn't try to be as smart as other compilers (which, in general, also have much more information to work with). The main design goal, apart from obviously being correct, is to a) keep it simple and b) be as swift as possible in compiling these so you don't even notice that a compilation phase exists.

In the end, you shouldn't trouble yourself with small issues like this one. The benefit in speed is tiny, constant and, dwarfed by the overhead introduced by the fact that Python is interpreted.

_{*dis is a little Python module that dis-assembles your code, you can use it to see the Python bytecode that the VM will execute.}

Note: As also stated in a comment by @Jorn Vernee, this is specific to the CPython implementation of Python. Other implementations might do more aggressive optimizations if they so desire, CPython doesn't.

Not a python person (c++) so I do not know how it works under the hood but shouldn't the first case get optimized to the second case? A decent C++ compiler would make that optimization. — NathanOliver, Apr 13 '17 at 11:46
@NathanOliver it really doesn't, Python will do as told here without even attempting to play it smart. — Dimitris Fasarakis Hilliard, Apr 13 '17 at 11:50
The fact that @NathanOliver's *perfectly reasonable and intelligent* guess at an answer to this question is completely wrong is, to my eyes, proof that this isn't a "self-explanatory", "nonsense", "stupid" question that can be answered by "taking a moment to think", as TigerhawkT3 would have us believe. It's a valid, interesting question that I wasn't certain of the answer to despite having been a professional Python programmer for years. — Mark Amery, Apr 13 '17 at 13:40
Python's compiler is at best 'conservative', not 'very conservative'. The main design goal is not to be "as swift as possible ... so you don't even notice that a compilation phase exists." That is secondary, after "keep it simple". A function with large constants like "1<<(2\*\*34)" and "b'x'*(2\*\*32)" take several seconds to compile, and generate GB-sized constants, even if the function is never run. The large string will even be discarded by the compiler. Proposed fixes for these cases have been rejected as they would make the compiler too complex. — Andrew Dalke, May 22 '17 at 01:54
@AndrewDalke thanks for the insiders comment on this, I tweaked the wording to address the issues you pointed out. — Dimitris Fasarakis Hilliard, May 22 '17 at 05:03

score 3 · Answer 2 · edited Apr 14 '17 at 09:11

Both are basically the same except that in the first case the object 42 is simply aassigned to a variable named a or, in other words, names (i.e. a) refer to values (i.e. 42) . It doesn't do any assignment technically, in the sense that it never copies any data.

While returning, this named binding a is returned in the first case while the object 42 is return in the second case.

For more reading, refer this great article by Ned Batchelder

Does Python optimize away a variable that's only used as a return value?

2 Answers2

Linked

Related