1

I am wondering why this function:

def digit(k):
    return len(str(k))

is faster than this one ? :

def digit(k):
    i = 0
    while k != 0:
        k = k // 10
        i += 1
    return i

And why it's the opposite for example in C?

Regis
  • 53
  • 6
  • 6
    FWIW, `len` already returns an `int`, no need to convert it again. And why it's faster: because `str` and `len` are implemented in C and highly optimised, while a loop in userland code requires a lot more steps to complete. – deceze Nov 26 '18 at 12:55
  • @deceze It's claimed, that the loop is faster in C. I'm not sure about this, but, like stated, in Python the loop is slower - which maybe is because of what you said... – Kai Huppmann Nov 26 '18 at 13:06
  • 1
    @Kai A C compiler could highly optimise such a loop. I don't know C well enough to support or refute the claim that it'll actually be faster. But Python most certainly doesn't do that level of optimisation and will actually execute all those steps in the loop, which should quite obviously be slower. – deceze Nov 26 '18 at 13:08
  • @deceze Now I got your comment :) ... Sorry – Kai Huppmann Nov 26 '18 at 13:11
  • 1
    Why don't you provide the C code that you claim is different? – GermanNerd Nov 26 '18 at 13:19
  • @deceze I already know that... And no it's not the answer, i think you need to take a look at this : https://rushter.com/blog/python-integer-implementation/ I just don't really understand – Regis Nov 26 '18 at 14:00
  • 2
    …?! ‍♂️ You already know that a whole lot of userland code will be slower than two function calls to C code… *and it's not the answer?!* How does that work? – deceze Nov 26 '18 at 14:01
  • Forgot C... I just asked why is it faster in Python, and how does it work ? – Regis Nov 26 '18 at 14:04

1 Answers1

4

Let's look at what happens if we take your Python code and translate it as literally as possible to C. We can do that very easily with Cython:

# save this in a file named "testmod.pyx" and compile it with Cython and a
# C compiler - details vary depending on OS and Python installation

from libc.stdio cimport snprintf
from libc.string cimport strlen

def c_digit_loop(k_):
    cdef unsigned int k = k_
    cdef int i = 0
    while k != 0:
        k = k // 10
        i += 1
    return i

def c_digit_str(k_):
    cdef unsigned int k = k_
    cdef char strbuf[32] # more than enough for any 'unsigned int'
    snprintf(strbuf, sizeof(strbuf), "%u", k);
    return strlen(strbuf);

The machine code you get from this is not as optimal as it could be, but it's close enough for a quick test. This allows us to compare performance directly using timeit, like this:

# save this in a file named 'test.py' and run it using the
# same CPython you compiled testmod.pyx against

import timeit
from testmod import c_digit_loop, c_digit_str

def py_digit_loop(k):
    i = 0
    while k != 0:
        k = k // 10
        i += 1
    return i

def py_digit_str(k):
    return len(str(k))

def test1(name):
    print(name, timeit.timeit(name+"(1234567)", "from __main__ import "+name,
                              number=10000))

test1("py_digit_loop")
test1("py_digit_str")
test1("c_digit_str")
test1("c_digit_loop")

When I run this program, this is the output I get on the computer where I'm typing this. I've manually lined up the numbers to make them easier to compare by eye.

py_digit_loop 0.004024484000183293
py_digit_str  0.0020454510013223626
c_digit_str   0.0009924650003085844
c_digit_loop  0.00025072999960684683

So that confirms your original assertion: the loop is slower than converting to a string in Python, but in C it's the other way around. But notice that converting to a string in C is still faster than converting to a string in Python.

To know precisely why this is happening we would need to dig deeper into the guts of the Python interpreter than I feel like doing this morning, but I know enough about its guts already to tell you in outline. The CPython interpreter is not very efficient. Even operations on small integers involve reference counting and construction of scratch objects on the heap. Your loop that does basic arithmetic in Python requires one or two scratch objects per iteration (depending on whether 0, 1, 2, ... are "interned"). Doing the calculation by converting to a string and taking its length involves creating only one temporary object, the string, for the whole calculation. The bookkeeping involved with these scratch objects dwarfs the cost of the actual calculation, for both of the Python implementations.

The C string-based implementation performs almost exactly the same steps that the Python string-based implementation performs, but its scratch object is a char array on the stack, not a full-fledged Python string object, and that all by itself is apparently good for a 40-50% speedup.

The C loop-based implementation compiles down to eight machine instructions for the actual loop. No memory accesses. Not even a hardware divide instruction (that's the magic of strength reduction). And then hundreds more instructions dealing with the Python object model. Most of those 0.00025 seconds are still overhead.

zwol
  • 135,547
  • 38
  • 252
  • 361