Why is an in-place integer operation like a = b slower than a = a b?

Question

I know integers are immutable so the computed values do not modify the original integers. Therefore the in-place operations should do the same as the simple operations, 1. compute the value and 2. reassign the value back to the variable. But why are the in-place operations slower than the simple ones?

import timeit
print("a = a + 1: ", end="")
print(timeit.timeit("for i in range(100): a = a + 1", setup="a = 0"))
print("a += 1: ", end="")
print(timeit.timeit("for i in range(100): a += 1", setup="a = 0"))

print("a = a - 1: ", end="")
print(timeit.timeit("for i in range(100): a = a - 1", setup="a = 0"))
print("a -= 1: ", end="")
print(timeit.timeit("for i in range(100): a -= 1", setup="a = 0"))

print("a = a * 1: ", end="")
print(timeit.timeit("for i in range(100): a = a * 1", setup="a = 1"))
print("a *= 1: ", end="")
print(timeit.timeit("for i in range(100): a *= 1", setup="a = 1"))

print("a = a // 1: ", end="")
print(timeit.timeit("for i in range(100): a = a // 1", setup="a = 1"))
print("a //= 1: ", end="")
print(timeit.timeit("for i in range(100): a //= 1", setup="a = 1"))

Output:

a = a + 1: 2.922127154
a += 1: 2.9701245480000003
a = a - 1: 2.9568866799999993
a -= 1: 3.1065419050000003
a = a * 1: 2.2483990140000003
a *= 1: 2.703524648
a = a // 1: 2.534561783000001
a //= 1: 2.6582312889999997

All the in-place operations are slower than the simple ones. Addition has the smallest difference while multiplication has the greatest.

A similar thread https://stackoverflow.com/questions/47307518/which-operator-vs-should-be-used-for-performance-in-place-vs-not-in-pla — Dhinesh Sunder Ganapathi, May 21 '21 at 09:57
I got different result.that is : `//=` is faster than `a = a // b ` , also `+=` is faster than `a = a + b` — S.B, May 21 '21 at 10:32
the difference between `a = a+b` and `a += b` seems to be small enough that sometimes in-place one is faster, but I still found simple one will be faster in most time with a few repeat tries. — adamkwm, May 21 '21 at 11:20
I suspet the difference is due to `int` objects not actually implementing the in-plae operators, so there's some overhead when `int.__iadd__` is checked, found not to exist, then it does `__add__` instead... although, this doesn't explain the relative difference... — juanpa.arrivillaga, May 21 '21 at 11:37
The delta I get between runs is quite a bit larger than the delta between the two types of operations. I'm not sure you can draw any real conclusions here. — Tim Roberts, May 21 '21 at 18:35
The only differrence in the byte code is the opcode `BINARY_ADD` vs `INPLACE_ADD`. — Tim Roberts, May 21 '21 at 18:38
Just speculation: CPython stores small integers as "C int types" and large integers as "lists of digits". Maybe the in-place operations check for overflow. — hilberts_drinking_problem, May 21 '21 at 19:07
@hilberts_drinking_problem why would it? All the numeric types are immutable, a new object is created even when you use the in-place operator — juanpa.arrivillaga, May 21 '21 at 19:56
To be clear, `+=` is not an "in-place" operator - [in-place](https://en.wikipedia.org/wiki/In-place_algorithm) means it should mutate the input without allocating new memory proportional to the size of the input; but `+=` on integers does not mutate the original integer, it creates a new object which needs memory allocated for it. The Python language reference defines `+=` as an [augmented assignment](https://docs.python.org/3/reference/simple_stmts.html#augmented-assignment-statements) operator, and only says that classes implementing such operators should do so in-place "when possible". — kaya3, Jun 13 '21 at 11:01

Marco Oliveira · Answer 1 · 2021-06-13T11:37:01.553

There may be a problem in that experiment: a single for-loop with 100 iterations and only containing an assignment statement like a=a+1 or a+=1 normally will not take that long to run (more than a second).

Compare those results using timeit to the following direct execution of the same for-loop:

def not_in_place(n, a=0):
    for i in range(n): a = a + 1
    return a

def in_place(n, a=0):
    for i in range(n): a += 1
    return a

As expected, it takes almost 100 million iterations to get similar times (a magnitude of seconds):

not_in_place(100_000_000)
in_place(100_000_000)

(Edit: As pointed out in the comment: the timed statement, that contains 100 iterations, was wrapped in a million-iteration loop.)

We still need to establish which is faster: in-place (a+=1) or not in-place (a=a+1).

To do so, we need to observe how both cases behave given an increasing number of iterations:

import perfplot
perfplot.bench(
    n_range=[2**k for k in range(26)],
    setup=lambda n: n,
    kernels=[not_in_place, in_place],
    labels=["a = a + 1", "a += 1"],
).show()

The observed difference in the first 100 iterations using timeit couldn't be replicated in large scale runs using the same for-loop code in callback functions for each run:

Most importantly: It seems that the time difference between both cases (in-place and not in-place) is invariant to the number of iterations.

The same goes for the other operators: (-, *, etc.):

`timeit` wraps the timed statement in a million-iteration loop by default. — user2357112, Jun 13 '21 at 10:33
Thanks for answer, good to know the difference is invariant with scale. — adamkwm, Jun 13 '21 at 11:30

Raymond Hettinger · Accepted Answer · 2021-06-13T10:52:48.007

Short answer

The inplace operation does slightly more work because it has to determine whether a custom inplace operation has been defined or whether to fallback to the normal binary operation.

Timings

For me, the timings are almost identical:

$ python3.9 -m timeit -s 'a=1' 'a *= 1'
10000000 loops, best of 5: 27.6 nsec per loop

$ python3.9 -m timeit -s 'a=1' 'a = a * 1'
10000000 loops, best of 5: 27.8 nsec per loop

Explanation

I would expect inplace version to be microscopically slower because the dispatch code first checks for an inplace slot to be defined and then falls back to the regular binary operation.

It takes a little time to determine that the inplace slot has not been defined for immutable objects like int and float.

That said, almost all of the rest of the code is the same which is why the timings are so close.

Dive into the source

The relevant code is in Objects/abstract.c:

/* The in-place operators are defined to fall back to the 'normal',
   non in-place operations, if the in-place methods are not in place.

   - If the left hand object has the appropriate struct members, and
     they are filled, call the appropriate function and return the
     result.  No coercion is done on the arguments; the left-hand object
     is the one the operation is performed on, and it's up to the
     function to deal with the right-hand object.

   - Otherwise, in-place modification is not supported. Handle it exactly as
     a non in-place operation of the same kind.

   */

static PyObject *
binary_iop1(PyObject *v, PyObject *w, const int iop_slot, const int op_slot
            )
{
    PyNumberMethods *mv = Py_TYPE(v)->tp_as_number;
    if (mv != NULL) {
        binaryfunc slot = NB_BINOP(mv, iop_slot);
        if (slot) {
            PyObject *x = (slot)(v, w);
            if (x != Py_NotImplemented) {
                return x;
            }
            Py_DECREF(x);
        }
    }
    return binary_op1(v, w, op_slot);
}

I also try `timeit` in CLI form, but my timing difference is larger, 20.4 nsec for `a = a * 1` and 24.7 nsec for `a *= 1`, may be this is difference in system? I'm using Windows 7 with Python 3.8.4. — adamkwm, Jun 13 '21 at 11:28

Why is an in-place integer operation like a *= b slower than a = a * b?

2 Answers2

Short answer

Timings

Explanation

Dive into the source

Why is an in-place integer operation like a = b slower than a = a b?