1

What is the fastest way to remove the N-Rightmost number in python integer?

Here are some of my code:

def f_int(dividend,n):
    return int(dividend/(10 ** n))

def f_str_to_int(dividend,n):
    return int(str(dividend)[:-n])

If we don't care whether the output is in str on int, we can skip the int() in def_f_str_to_int():

def f_str(dividend,n):
    return str(dividend)[:-n]

We also can increase speed by asking the input in the form of power of ten:

divisor_in_power_of_ten = 10 ** n #outside function run time
def f_int_hack (dividend,divisor_in_power_of_ten):
    return int(dividend/(divisor_in_power_of_ten))

My question is, is there a faster way (maybe using bit manipulation)? I need to optimize it since it will be used as part of a real-time web. Note: Restricted only using python, not JS or Cython (it's okay to extend your answer in JS and Cython, but that is not the main question).

Here is my result:

FUNCTION: f_int_hack Used 200170 times
        MEDIAN 1.1000000004202093e-06
        MEAN   1.5179877104460484e-06
        STDEV  3.600025074234889e-05
FUNCTION: f_int Used 199722 times
        MEDIAN 1.8999999999991246e-06
        MEAN   2.420203582980709e-06
        STDEV  3.482858342790541e-05
FUNCTION: f_str Used 200132 times
        MEDIAN 1.4999999997655777e-06
        MEAN   1.7462234924949252e-06
        STDEV  1.4733864640549157e-05
FUNCTION: f_str_to_int Used 199639 times
        MEDIAN 2.000000000279556e-06
        MEAN   2.751038624717222e-06
        STDEV  6.383386278143267e-05

Edit: Function for the benchmark (edit accordingly, since I edit some of them):

import time
import random
import statistics

def benchmark(functions, iteration, *args):
    times = {f.__name__: [] for f in functions}
    
    for i in range(iteration):
        func = random.choice(functions)
        t0 = time.perf_counter()
        func(*args)
        t1 = time.perf_counter()
        times[func.__name__].append(t1 - t0)
    
    for name, numbers in times.items():
        print('FUNCTION:', name, 'Used', len(numbers), 'times')
        print('\tMEDIAN', statistics.median(numbers))
        print('\tMEAN  ', statistics.mean(numbers))
        print('\tSTDEV ', statistics.stdev(numbers))

if __name__=="__main__":
    # Variables
    divident = 12345600000
    n = 3
    iteration = 1000000

    # The functions to compare
    def f_int(divident,n):
        return int(divident/(10 ** n))

    def f_str_to_int(divident,n):
        return int(str(divident)[:-n])

    functions = f_int, f_str_to_int
    benchmark(functions, iteration, divident, n)

More clarification: input is python integer. Actually, i didn't care with the output format (wether it's str or int), but let's make the output in int first and str as "bonus question".

Edit: From comment section a//b:

FUNCTION: f_int_double_slash Used 166028 times
        MEDIAN 1.2000000002565514e-06
        MEAN   1.4399938564575845e-06
        STDEV  1.2767417156171526e-05

YES, IT'S FASTER THAN int(a/b)

Edit: From comment, if we accept float, the fastest way is:

def f_float_hack(dividend,divisor_in_power_of_ten):
    return dividend//divisor_in_power_of_ten

With result:

FUNCTION: f_float_hack Used 142983 times
        MEDIAN 7.000000001866624e-07
        MEAN   9.574040270508322e-07
        STDEV  3.725603760159355e-05

Improvement of int double slash using precomputing power of ten:

FUNCTION: f_int_double_slash_hack Used 143082 times
        MEDIAN 7.999999995789153e-07
        MEAN   1.1596266476572136e-06
        STDEV  4.9442788346866335e-05

Current result: float_hack is the fastest (if we accept float).

Muhammad Yasirroni
  • 1,512
  • 12
  • 22
  • 2
    As a personal curiosity, could you compare a direct integer division (`return dividend//(10 ** n)`) vs. the version with float div + `int()`? – GPhilo Sep 01 '20 at 12:30
  • How big are your numbers? – superb rain Sep 01 '20 at 12:30
  • 1
    Also, since you're showing timing, could you share the code you use to generate those timings? (It makes trying out alternatives much easier, as we wouldn't have to write as much boilerplate code) – GPhilo Sep 01 '20 at 12:32
  • 1
    @GPhilo wihtout int, that's the fastest, `MEAN 8.843473746738569e-07` @superb for now, 13 digits – Muhammad Yasirroni Sep 01 '20 at 12:43
  • @MuhammadYasirroni With that info, I'd wager the pure int-div (i.e., precalculating the power of 10) can't quite be beaten. – GPhilo Sep 01 '20 at 12:45
  • @GPhilo I will add that as bonus answer, since it's not only removing N-Rightmost but adding ".0" in the end. Fortunately, that's what I implement in my web for now. Just forget to add it in my question. I will edit. – Muhammad Yasirroni Sep 01 '20 at 12:55
  • 1
    @MuhammadYasirroni If it adds ".0", you didn't do it right. – superb rain Sep 01 '20 at 12:57
  • It would be faster if you didn't put it into a function. – superb rain Sep 01 '20 at 13:01
  • @superbrain Thanks for noting the function make it slower as expalined in [here](https://stackoverflow.com/questions/14648374/python-function-calls-are-really-slow). But, function in this question is just to make it easier to do comparison between methods. Also as I said, the float method is bonus question, not my original question. – Muhammad Yasirroni Sep 01 '20 at 13:50
  • It's easy [without functions](https://repl.it/repls/HummingLawfulConditions#main.py). – superb rain Sep 01 '20 at 13:56
  • @superbrain thanks for the `from timeit import repeat`. Never know about that before. – Muhammad Yasirroni Sep 01 '20 at 15:10

0 Answers0