Why does this specific code run faster in Python 3.11?

Question

I have the following code in a Python file called benchmark.py.

source = """
for i in range(1000):
    a = len(str(i)) 
"""

import timeit

print(timeit.timeit(stmt=source, number=100000))

When I tried to run with multiple python versions I am seeing a drastic performance difference.

C:\Users\Username\Desktop>py -3.10 benchmark.py
16.79652149998583

C:\Users\Username\Desktop>py -3.11 benchmark.py
10.92280820000451

As you can see this code runs faster with python 3.11 than previous Python versions. I tried to disassemble the bytecode to understand the reason for this behaviour but I could only see a difference in opcode names (CALL_FUNCTION is replaced by PRECALL and CALL opcodes).

I am quite not sure if that's the reason for this performance change. so I am looking for an answer that justifies with reference to cpython source code.

python 3.11 bytecode

  0           0 RESUME                   0

  2           2 PUSH_NULL
              4 LOAD_NAME                0 (range)
              6 LOAD_CONST               0 (1000)
              8 PRECALL                  1
             12 CALL                     1
             22 GET_ITER
        >>   24 FOR_ITER                22 (to 70)
             26 STORE_NAME               1 (i)

  3          28 PUSH_NULL
             30 LOAD_NAME                2 (len)
             32 PUSH_NULL
             34 LOAD_NAME                3 (str)
             36 LOAD_NAME                1 (i)
             38 PRECALL                  1
             42 CALL                     1
             52 PRECALL                  1
             56 CALL                     1
             66 STORE_NAME               4 (a)
             68 JUMP_BACKWARD           23 (to 24)

  2     >>   70 LOAD_CONST               1 (None)
             72 RETURN_VALUE

python 3.10 bytecode

  2           0 LOAD_NAME                0 (range)
              2 LOAD_CONST               0 (1000)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                 8 (to 26)
             10 STORE_NAME               1 (i)

  3          12 LOAD_NAME                2 (len)
             14 LOAD_NAME                3 (str)
             16 LOAD_NAME                1 (i)
             18 CALL_FUNCTION            1
             20 CALL_FUNCTION            1
             22 STORE_NAME               4 (a)
             24 JUMP_ABSOLUTE            4 (to 8)

  2     >>   26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

PS: I understand that python 3.11 introduced bunch of performance improvements but I am curios to understand what optimization makes this code run faster in python 3.11

Have you tried to read [the 3.11 changelog](https://docs.python.org/3.11/whatsnew/changelog.html) to see if there's something related to the interpreter itself being optimized? — Some programmer dude, Oct 26 '22 at 11:31
@Someprogrammerdude Thanks, I understand that 3.11 comes with performance improvements but I am curios to understand what optimization makes this code run faster in python 3.11 — Abdul Niyas P M, Oct 26 '22 at 15:42
Interesting question. I have no clue whatsoever, but maybe [this answer](https://stackoverflow.com/a/69821127) would... dunno... give you a starting point? Particularly what it says in the paragraph starting with _In Python 3.11, the frame object will be replaced by an array of structs that won't have an object header._ ? I insist: I have no idea. But it is an interesting question — Savir, Oct 26 '22 at 16:00
Note that Stack Overflow's scope is limited to _practical_ questions. Will knowing this detail change how you go about the practice of software development? — Charles Duffy, Oct 26 '22 at 16:43
Are you using a class structure? Unfortunately, using classes makes source code run slow. — JustBeingHelpful, Oct 26 '22 at 18:37
@CharlesDuffy If you know the reason for an observed speed-up, you might be better able to exploit it. If the answer is something like "it is faster because your code has feature X" and this version of Python handles this feature better than before then you might strive to write code with that feature. In any event -- who says that all questions must be practical? Striving to better understand a programming language is a valuable goal even when it has no immediate payoff. — John Coleman, Oct 26 '22 at 18:37

user2357112 · Accepted Answer · 2022-10-27T10:18:13.190

There's a big section in the "what's new" page labeled "faster runtime". It looks like the most likely cause of the speedup here is PEP 659, which is a first start towards JIT optimization (perhaps not quite JIT compilation, but definitely JIT optimization).

Particularly, the lookup and call for len and str now bypass a lot of dynamic machinery in the overwhelmingly common case where the built-ins aren't shadowed or overridden. The global and builtin dict lookups to resolve the name get skipped in a fast path, and the underlying C routines for len and str are called directly, instead of going through the general-purpose function call handling.

You wanted source references, so here's one. The str call will get specialized in specialize_class_call:

    if (tp->tp_flags & Py_TPFLAGS_IMMUTABLETYPE) {
        if (nargs == 1 && kwnames == NULL && oparg == 1) {
            if (tp == &PyUnicode_Type) {
                _Py_SET_OPCODE(*instr, PRECALL_NO_KW_STR_1);
                return 0;
            }

where it detects that the call is a call to the str builtin with 1 positional argument and no keywords, and replaces the corresponding PRECALL opcode with PRECALL_NO_KW_STR_1. The handling for the PRECALL_NO_KW_STR_1 opcode in the bytecode evaluation loop looks like this:

        TARGET(PRECALL_NO_KW_STR_1) {
            assert(call_shape.kwnames == NULL);
            assert(cframe.use_tracing == 0);
            assert(oparg == 1);
            DEOPT_IF(is_method(stack_pointer, 1), PRECALL);
            PyObject *callable = PEEK(2);
            DEOPT_IF(callable != (PyObject *)&PyUnicode_Type, PRECALL);
            STAT_INC(PRECALL, hit);
            SKIP_CALL();
            PyObject *arg = TOP();
            PyObject *res = PyObject_Str(arg);
            Py_DECREF(arg);
            Py_DECREF(&PyUnicode_Type);
            STACK_SHRINK(2);
            SET_TOP(res);
            if (res == NULL) {
                goto error;
            }
            CHECK_EVAL_BREAKER();
            DISPATCH();
        }

which consists mostly of a bunch of safety prechecks and reference fiddling wrapped around a call to PyObject_Str, the C routine for calling str on an object.

Python 3.11 includes many other performance enhancements besides the above, including optimizations to stack frame creation, method lookup, common arithmetic operations, interpreter startup, and more. Most code should run much faster now, barring things like I/O-bound workloads and code that spent most of its time in C library code (like NumPy).

For fun, [here](https://i.stack.imgur.com/gQWJV.png) is what [`specialist`](https://github.com/brandtbucher/specialist) shows for this script. — tripleee, Oct 27 '22 at 10:29

Why does this specific code run faster in Python 3.11?

1 Answers1