17

Following the question about Chaining *= += operators and the good comment of Tom Wojcik ("Why would you assume aaa *= 200 is faster than aaa = aaa * 200 ?"), I tested it in Jupyter notebook:

%%timeit aaa = np.arange(1,101,1)
    aaa*=100

%%timeit aaa = np.arange(1,101,1)
    aaa=aaa*100

And I was surprised because the first test is longer than the second one: 1530ns and 952ns, respectively. Why these values are so different?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Stef1611
  • 1,978
  • 2
  • 11
  • 30
  • 6
    If you reverse the order here, what are the results? – jarmod Apr 20 '21 at 14:11
  • @jarmod what do you mean "reverse the order"? It looks like those two timeit calls are independent – Pranav Hosangadi Apr 20 '21 at 14:14
  • @PranavHosangadi I think he meant to check if the calculation done in the first statement is used by the second. – Vishal Singh Apr 20 '21 at 14:15
  • TIO's python interpreter agrees: [Try it online!](https://tio.run/##K6gsycjPM/7/PzO3IL@oRCGvNLegUiGxWCGvgIsrMTFRwRbI0kssSsxLT9Uw1DE0AGJNrrT8IoVMhcw8Bai4ARhoWnEpKAD1aNkCBYBMZSDbFsQHcv//BwA "Python 3 – Try It Online") The second is indeed faster. – General Grievance Apr 20 '21 at 14:15
  • 3
    This is related to numpy. [Doesn't happen with regular ints or floats](https://i.stack.imgur.com/vs3AW.png) – Pranav Hosangadi Apr 20 '21 at 14:16
  • 4
    Changing the range to be `np.arange(1,10001,1)` actually reverses the results: `aaa*=100` is faster! So the in-place is still faster as the input grows in size. For small arrays, for some reason, creating a new array is more efficient... – Tomerikoo Apr 20 '21 at 14:22
  • 1
    The difference is that one modifies the data-structure itself (in-place operation) aaa*= 1 while the other just reassigns the variable a = a * 100, which I guess is the source of slower behavior. – MaPy Apr 20 '21 at 14:22
  • 2
    @MaPy I think you missed the point. The one assigning a new array is faster... – Tomerikoo Apr 20 '21 at 14:23
  • 1
    @python_user This graph indeed aligns with the results of some experiments I just did – Tomerikoo Apr 20 '21 at 14:28
  • @Tomerikoo my results indicates that aaa *= 200 is faster than aaa = aaa * 200, (Python 3.8), that's the source of confusion. – MaPy Apr 20 '21 at 14:31
  • 1
    @MaPy Please see my comment above (and the linked question). This depends on the input size. For small arrays, as in the question itself, reassigning is somehow faster. As the array grows in size, indeed as expected the in-place operation is faster (by a small margin) – Tomerikoo Apr 20 '21 at 14:36

2 Answers2

13

TL;DR: this question is equivalent to the performance difference between inplace_binop (INPLACE_*) (aaa*=100) vs binop (BINARY_*) (aaa=aaa*100). The difference can be found by using dis module:

import numpy as np
import dis

aaa = np.arange(1,101,1)
dis.dis('''
for i in range(1000000):
  aaa*=100
''')
  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 INPLACE_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE
dis.dis('''
for i in range(1000000):
  aaa=aaa*100
''')
  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 BINARY_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE

Then back to your question, which is absolutely faster?

Unluckily, it's hard to say which function is faster, here's why:

You can check compile.c of CPython code directly. If you trace a bit into CPython code, here's the function call difference:

  • inplace_binop -> compiler_augassign -> compiler_visit_stmt
  • binop -> compiler_visit_expr1 -> compiler_visit_expr -> compiler_visit_kwonlydefaults

Since the function call and logic are different, that means there are tons of factors (including your input size(*), CPU...etc) could matter to the performance as well, you'll need to work on profiling to optimize your code based on your use case.

*: from others comment, you can check this post to know the performance of different input size.

Kir Chou
  • 2,980
  • 1
  • 36
  • 48
  • Very thanks to all !!! I was not thinking that my question would take me so far. I am going to try to understand this very interesting answer. – Stef1611 Apr 20 '21 at 15:06
  • Thanks for your answer. I am beginning python (I come from C world) and it is very interesting to see the python bytecode. You make me discover the dis module. I wonder if it is possible to go deeper. For example, until the asm code ? – Stef1611 Apr 20 '21 at 15:18
  • @Stef1611 The missing puzzle includes bytecode and VM, you can initially refer [The AST and Me by Emily Morehouse-Valcarcel @ PyCon 2018](https://www.youtube.com/watch?v=XhWvz4dK4ng) and [this SO answer](https://stackoverflow.com/a/58386510/2740386) for details. To know further, please consider the [CPython internals class by Philip Guo](https://www.youtube.com/playlist?list=PLwyG5wA5gIzgTFj5KgJJ15lxq5Cv6lo_0). – Kir Chou Apr 20 '21 at 15:26
  • Thanks a lot. I think I will be occupied for some days or months ... But very very interesting. – Stef1611 Apr 20 '21 at 15:30
7

The += symbol appeared in the C language in the 1970s, and - with the C idea of "smart assembler" correspond to a clearly different machine instruction and addressing mode

"a=a * 100" "a *= 100" produce the same effect but correspond at low level to a different way the processor is working.

a *= 100 means

  • find the place identified by a
  • multiply with 100

a = a * 100 means:

  • evaluate a*100
  • Find the place identified by a
  • Copy a into an accumulator
  • multiply with 100 the accumulator
  • Store the result in a
  • Find the place identified by a
  • Copy the accumulator to it

Python is coded in C, it inherited the syntax from C, but since there is no translation / optimization before the execution in interpreted languages, things are not necessarily so intimately related (since there is one less parsing step). However, an interpreter can refer to different execution routines for the three types of expression, taking advantage of different machine code depending on how the expression is formed and on the evaluation context.

Petronella
  • 2,327
  • 1
  • 15
  • 24