28

If I have an expression that I wish to evaluate in Python, such as the expression for r in the code snippet below, will the Python interpreter be smart and reuse the subresult x+y+z, or just evaluate it twice? I'd also be interested to know if the answer to this question would be the same for a compiled language, e.g. C.

x = 1
y = 2
z = 3
r = (x+y+z+1) + (x+y+z+2)

It has been suggested that this question is similar to this question. I believe it is similar. However, I believe the linked question is less of a 'minimal example'. Also in the linked question there is no ambiguity as to the order of operations, for example, in examples similar to this question, where there is no defined order for operations (mathematically), depending on the order of the individual function calls (which are ambiguous), a worse or better job of optimisation may be done. Consider (abba)(abba)(baa*b), there are nested repeated substrings, and depending on the order and amount of preprocessing many different optimisations could be performed.

user189076
  • 439
  • 3
  • 7
  • 1
    Possible duplicate of [Does Python automatically optimize/cache function calls?](https://stackoverflow.com/questions/51803046/does-python-automatically-optimize-cache-function-calls) – Georgy Sep 29 '19 at 08:35
  • 1
    Checked with C++, it simply returns 15 because all values are known. If you make the variables dependent on some input so it can't optimize them away, it calculates (x+y+z+3). – Sebastian Wahl Oct 01 '19 at 13:48
  • 1
    @SebastianWahl Did you mean it will compute `2*(x + y + z) + 3`? Or something else? It would also be informative to indicate the compiler that you've used for checking that result. – a_guest Oct 02 '19 at 09:54
  • 1
    @a_guest I misread the assembly here, it adds the result of (x + y + z) with itself and adds 3 in one instruction if I understand correct. It was GCC and you can see it online here and try other compilers: https://godbolt.org/z/PwxaxK – Sebastian Wahl Oct 02 '19 at 11:55

4 Answers4

20

You can check that with dis.dis. The output is:

  2           0 LOAD_CONST               0 (1)
              2 STORE_NAME               0 (x)

  3           4 LOAD_CONST               1 (2)
              6 STORE_NAME               1 (y)

  4           8 LOAD_CONST               2 (3)
             10 STORE_NAME               2 (z)

  5          12 LOAD_NAME                0 (x)
             14 LOAD_NAME                1 (y)
             16 BINARY_ADD
             18 LOAD_NAME                2 (z)
             20 BINARY_ADD
             22 LOAD_CONST               0 (1)
             24 BINARY_ADD
             26 LOAD_NAME                0 (x)
             28 LOAD_NAME                1 (y)
             30 BINARY_ADD
             32 LOAD_NAME                2 (z)
             34 BINARY_ADD
             36 LOAD_CONST               1 (2)
             38 BINARY_ADD
             40 BINARY_ADD
             42 STORE_NAME               3 (r)
             44 LOAD_CONST               3 (None)
             46 RETURN_VALUE

So it won't cache the result of the expression in parentheses. Though for that specific case it would be possible, in general it is not, since custom classes can define __add__ (or any other binary operation) to modify themselves. For example:

class Foo:
    def __init__(self, value):
        self.value = value

    def __add__(self, other):
        self.value += 1
        return self.value + other

x = Foo(1)
y = 2
z = 3
print(x + y + z + 1)  # prints 8
print(x + y + z + 1)  # prints 9

If you have an expensive function of which you would like to cache the result, you can do so via functools.lru_cache for example.

On the other hand, the compiler will perform constant folding as can be seen from the following examples:

>>> import dis
>>> dis.dis("x = 'abc' * 5")
  1           0 LOAD_CONST               0 ('abcabcabcabcabc')
              2 STORE_NAME               0 (x)
              4 LOAD_CONST               1 (None)
              6 RETURN_VALUE
>>> dis.dis("x = 1 + 2 + 3 + 4")
  1           0 LOAD_CONST               0 (10)
              2 STORE_NAME               0 (x)
              4 LOAD_CONST               1 (None)
              6 RETURN_VALUE
a_guest
  • 34,165
  • 12
  • 64
  • 118
4

EDIT: This answer applies only to the default CPython interpreter of the Python language. It may not be applicable to other Python implementations that adopts JIT compilation techniques or uses a restricted Python sublanguage that allows static type inference. See @Jörg W Mittag's answer for more details.

No it will not. You can do this to see the compiled code:

from dis import dis
dis("r=(x+y+z+1) + (x+y+z+2)")

Output:

          0 LOAD_NAME                0 (x)
          2 LOAD_NAME                1 (y)
          4 BINARY_ADD
          6 LOAD_NAME                2 (z)
          8 BINARY_ADD
         10 LOAD_CONST               0 (1)
         12 BINARY_ADD
         14 LOAD_NAME                0 (x)
         16 LOAD_NAME                1 (y)
         18 BINARY_ADD
         20 LOAD_NAME                2 (z)
         22 BINARY_ADD
         24 LOAD_CONST               1 (2)
         26 BINARY_ADD
         28 BINARY_ADD
         30 STORE_NAME               3 (r)
         32 LOAD_CONST               2 (None)
         34 RETURN_VALUE

This is partially because Python is dynamically-typed. So the types of variables are not easily known at compile time. And the compiler has no way to know whether the + operator, which can be overloaded by user classes, could have any side effect at all. Consider the following simple example:

class A:
    def __init__(self, v):
        self.value = v

    def __add__(self, b):
        print(b)
        return self.value + b

x = A(3)
y = 4
r = (x + y + 1) + (x + y + 2)

For simple expressions, you can just save the intermediate results to a new variable:

z = x + y + 1
r = z + (z + 1)

For functions calls, functools.lru_cache is another option, as already indicated by other answers.

GZ0
  • 4,055
  • 1
  • 10
  • 21
3

If I have an expression that I wish to evaluate in Python, such as the expression for r in the code snippet below, will the Python interpreter be smart and reuse the subresult x+y+z, or just evaluate it twice?

Which Python interpreter are you talking about? There are currently four production-ready, stable Python implementations in widespread use. None of those actually have a Python interpreter, every single one of them compiles Python.

Some of them may or may not be able to perform this optimization for at least some programs under at least some circumstances.

The Python Language Specification does neither require nor forbid this kind of optimization, so any specification-conforming Python implementation would be allowed to, but not required, to perform this optimization.

I am pretty certain that, contrary to all the other answers which state that Python cannot do this, PyPy is capable of performing this optimization. Also, depending on which underlying platform you use, code executed using Jython or IronPython will also benefit from this optimization, e.g. I am 100% certain that the C2 compiler of Oracle HotSpot does perform this optimization.

I'd also be interested to know if the answer to this question would be the same for a compiled language […].

There is no such thing as a "compiled language". Compilation and interpretation are traits of the compiler or interpreter (duh!) not the language. Every language can be implemented by a compiler, and every language can be implemented by an interpreter. Case in point: there are interpreters for C, and conversely, every currently existing production-ready, stable, widely-used implementation of Python, ECMAScript, Ruby, and PHP has at least one compiler, many even have more than one (e.g. PyPy, V8, SpiderMonkey, Squirrelfish Extreme, Chakra).

A language is an abstract set of mathematical rules and restrictions written on a piece of paper. A language is neither compiled nor interpreted, a language just is. Those concepts live on different layers of abstraction; if English were a typed language, the term "compiled language" would be a type error.

I'd also be interested to know if the answer to this question would be the same for […] e.g. C.

There are many production-ready, stable C implementations in widespread use. Some of them may or may not be able to perform this optimization for at least some programs under at least some circumstances.

The C Language Specification does neither require nor forbid this kind of optimization, so any specification-conforming C implementation would be allowed to, but not required, to perform this optimization.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Are there any references for the PyPy optimization? I cannot see it by using `dis.dis`. And I wonder how that could be done in Python without the knowledge of runtime types (or maybe the optimization only applies to the scenarios where the types can be clearly inferred?). In Python, type hints are just annotations and runtime checking is not enforced at all (see [more details](http://doc.pypy.org/en/latest/faq.html#would-type-annotations-help-pypy-s-performance)). – GZ0 Sep 29 '19 at 13:59
  • "Are there any references for the PyPy optimization? I cannot see it by using `dis.dis`" – `dis.dis` will only show you the optimizations that the Python-to-bytecode compiler makes, which is actually a pretty simple and "stupid" compiler that performs almost no optimizations. Also, it is a static ahead-of-time compiler, so it has to contend with all the limitations the Halting Problems and Rice's Theorem entail. However, in PyPy, Jython, and IronPython, that bytecode is usually compiled further to native machine code, and *that* compiler is much more sophisticated. – Jörg W Mittag Sep 29 '19 at 14:05
  • 1
    "And I wonder how that could be done in Python without the knowledge of runtime types" – What makes you think an optimizer doesn't have knowledge of runtime types? The whole reason why many modern high-performance language execution engines delay compilation until runtime is precisely *because* that way the optimizer has access not only to runtime types, but also to runtime profiling data, data access patterns, branch statistics, etc. Even further, a compiler that is capable of *de-optimization* (such as HotSpot's) can make *speculative unsafe optimizations* and just remove them again when … – Jörg W Mittag Sep 29 '19 at 14:06
  • 1
    … it realizes that its speculation was wrong. I.e. it could remove the common subexpression under the assumption that it doesn't have any side-effects *even if it can't prove that it doesn't have side-effects*, but monitor it for side-effects, and when it detects a side-effect, it just recompiles that particular piece of code *without* CSE. – Jörg W Mittag Sep 29 '19 at 14:08
  • Unlike statically-typed languages such as C / Java, Python is dynamically typed so inferring runtime types is a fairly complicated task. I just checked that it is possible for the [RPython subset of Python](https://rpython.readthedocs.io/en/latest/faq.html#what-is-this-rpython-language) which imposes more restrictions to allow type inferences. – GZ0 Sep 29 '19 at 14:18
  • If I delay compilation until runtime, I don't *need* to infer types. I can just *look* at them. They are right there. The program is already running, all the objects are there, they all have their types. All of the impossibility results in CS, Rice's Theorem, the Halting Problem, the Function Problem, etc. are about *statically* analyzing code. They simply don't apply at runtime! HotSpot, probably one of the fastest, most advanced, most sophisticated execution engines in the world, mostly ignores static type information present in the JVML bytecode after the verification stage. – Jörg W Mittag Sep 29 '19 at 14:20
  • I see. These are the benefits of JIT compilers. That deoptimization technique is new to me. Thanks much for the information. By the way, for PyPy specifically, it seems that it still relies on type inferences offered by RPython. – GZ0 Sep 29 '19 at 15:02
  • I have upvoted your answer and updated my answer to clarify its limitation as well as pointing to your answer. – GZ0 Sep 29 '19 at 15:45
  • I was aware that there was this issue. I decided not to be more specific in the question because everybody knows that 99% of the time C is compiled and I assumed the same to be true about Python being interpreted (wrongly). I wished to avoid verbosity in the question. I think this answer makes a helpful contribution . – user189076 Oct 02 '19 at 15:43
1

No, python doesn't do that by default. f you need python to preserve the result of a certain calculation for you, you need to implicitly tell python to do that, one way to do this would be by defining a function and using functools.lru_cache docs:

from functools import lru_cache

@lru_cache(maxsize=32)
def add3(x,y,z):
  return x + y + z

x=1
y=2
z=3
r = (add3(x,y,z)+1) + (add3(x,y,z)+2)
yukashima huksay
  • 5,834
  • 7
  • 45
  • 78
  • _"No, python doesn't do that by default since it would use up too much memory ..."_ - Do you have a source for your claim that memory usage is the reason? Because honestly, I'm pretty sceptical of that. – marcelm Sep 28 '19 at 23:30
  • 2
    The reason is not that the common subexpression elimination “would use up too much memory”; it is that the transformation is not sound in Python because the language is too dynamic. – wchargin Sep 28 '19 at 23:30
  • @marcelm sorry. – yukashima huksay Sep 29 '19 at 04:53