Performance issue with Python code using bit shifting

Question

I am developing a Python application to determine divisibility by repeatedly truncating the final digits of dividend and transforming it to a smaller number. I am running Python 3.7.10 under Windows 10 using a P5 processor. The code was developed using PyCharm. I wish to benchmark the division by 16 by shifting 4 bits to the right against divmod().

import math
import datetime
from sys import getsizeof

def PrintTime(startTime, message):
    seconds_in_day = 24 * 3600
    later_time = datetime.datetime.now()
    difference = later_time - first_time
    time_diff = 10 ** 6 * (difference.days * seconds_in_day + difference.seconds) + difference.microseconds
    print(message + str(time_diff / 10 ** 6))

myVariable = 3 ** 100000 - 1
hLength = math.ceil(math.log(myVariable)/math.log(16))
dLength = math.ceil(math.log(myVariable)/math.log(10))
print('Length in hex' + ' ' + str(hLength) + '. Decimal length ' + str(dLength) + ' Size of myVariable ' + str(getsizeof(myVariable)))
n = myVariable

first_time = datetime.datetime.now()
for i1 in range(0, hLength - 1):
    a = n >> 4
    b = n & 0xF
    n = n - 1761 * b
PrintTime(first_time, 'Hex truncation time: ')

n = myVariable
first_time = datetime.datetime.now()
for i2 in range(0, hLength - 1):
   n, r = divmod(n, 10)
PrintTime(first_time, 'divmod time - base 16: ')

n = myVariable
first_time = datetime.datetime.now()
for i2 in range(0, dLength - 1):
   n, r = divmod(n, 16)
PrintTime(first_time, 'divmod time - base 16: ')

This is the output from the application. Length in hex 39625. Decimal length 47713 Size of myVariable 21160

Hex truncation time: 0.614306

divmod time - base 10: 0.526677

divmod time - base 16: 1.42582

Why is the divmod(n, 16) taking nearly three times the execution time as compared with divmod(n, 10). Am I correct in saying that Python is transforming division by 16 to base 10 and converting the output back to base 16?

Your benchmark timing code is *terrible*. Don't reinvent it yourself, use the `timeit` module (`min` of a call to `timeit.repeat` is a good approach). The only way to reliably test stuff like this is doing it many times, with various interfering things (e.g. intermittent cyclic garbage collection) disabled, and using precision monotonic timers. Your code does none of this, and the timing results can't be trusted as a result. — ShadowRanger, Sep 15 '22 at 20:56
I'll also note, the result of `divmod` should be directly unpacked, not stored raw then indexed (indexing is weirdly high overhead). `x = divmod(n, 10)`, `n = x[0]`, `r = x[1]` simplifies to just `n, r = divmod(n, 10)`. — ShadowRanger, Sep 15 '22 at 20:58
Final note: `b = n - (a << 4)` is an incredibly inefficient way to get the low bits of `n` when `n` & `a` are large; it involves constructing a large intermediate (`a << 4`) then performing relatively expensive array-based math to subtract it from `n`. The correct complement to right-shifting is masking, so you'd complement `a = n >> 4` with `b = n & 0xF` (which, unlike shifting back and subtracting, can run roughly as fast no matter how large `n` is; the work for masking is `O(min(n, m))` where `n` and `m` are the bit lengths of each operand, while the subtraction alone is `O(max(n, m))`). — ShadowRanger, Sep 15 '22 at 21:02
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community, Sep 15 '22 at 22:18
Thank you very much for comments ShawRanger. I am old time COBOL, Assembler and C programmer. The tips are highly valued. — Sin Keong Tong, Sep 16 '22 at 04:38
I found the answer for Question 2 is that the code did not reset the value of n. As a result, it was dividing 0 by 16. The suggestion by ShadowRanger to extract the final digit by b = n & 0xF has reduced the execution time by 40%. — Sin Keong Tong, Sep 16 '22 at 05:20

Performance issue with Python code using bit shifting

0 Answers0