0

Description of problem

I need to get the first number from a given integer. This operation will be done millions of times, therefore I need to make sure that I use the most efficient way of doing this.

If the length of the integer affects the answer, then in my case I know that the integer will always be a 2 digit number.

What I tried

I have tried the methods mentioned below. Method 1 and 2 seems slow since I have to convert back and forth. Method 3 uses //, ** and % which I could assume are also heavy on the system. Is there a better way of performing this seemingly "simple" task?

# Method 1:
first_digit = int(str(x)[0])

# Method 2:
first_digit = int(str(x)[1:])

# Method 3:
first_digit = x // 10 % 10
eligolf
  • 1,682
  • 1
  • 6
  • 22
  • 1
    if doing this millions of times might be better to use numpy over vanilla python – Umar.H Oct 16 '20 at 11:40
  • Did you try and do any timing of these different options? Are you sure that this will be a bottleneck in whatever you're going to use it, and that you're not trying to prematurely optimize? – Thierry Lathuille Oct 16 '20 at 11:40
  • Have you actually timed a million such operations? – Abhinav Mathur Oct 16 '20 at 11:48
  • I tried to time it now with the functions separated and it runs for a long time. But when I get the actually numbers out it says 0.0 which is weird. However, I tried using cProfile and then it told me that the operator int(str(x)[1:]) was amongst the 5 largest time consumers of my code, therefore my question. – eligolf Oct 16 '20 at 12:00
  • I advise you to give a look to https://stackoverflow.com/questions/5558492/divide-by-10-using-bit-shifts if you plan to use Numpy, Numba, Cython or PyPy. – Jérôme Richard Oct 16 '20 at 13:23

2 Answers2

2

If the number has never more than 2 digits the % 10 is useless. But could it have a single digit as well ? In that case the result would be zero which is wrong. So, assumed that the number is never more than 2 digits, the formula could be :

if x > 9: 
    return x // 10;
return x
dspr
  • 2,383
  • 2
  • 15
  • 19
2

I used timeit module to time your methods as well as dspr's on 10 million repeats :

from timeit import timeit

n = 10000000
print(timeit(stmt='import random; n = random.randint(10, 99); x = int(str(n)[0])', number=n))
print(timeit(stmt='import random; n = random.randint(10, 99); x = int(str(n)[1:])', number=n))
print(timeit(stmt='import random; n = random.randint(10, 99); x = n // 10 % 10', number=n))
print(timeit(stmt='import random; n = random.randint(10, 99); x = n//10 if n>9 else n', number=n))

which gave me the following results :

10.7325472
11.0877854
8.493264900000003
8.550117300000004

It seems that x // 10 % 10 method is a little bit faster than the others.

  • the same (for 100000 only) on https://repl.it/languages/python3 gives 1.6996094920032192 0.9015585849992931 0.7443572329939343 0.5641695599915693 – dspr Oct 16 '20 at 12:44
  • The timing include the generation of the random number and the import which are much slower. Thus, the timings are not very accurate. – Jérôme Richard Oct 16 '20 at 12:49
  • You're right. With a fixed integer (x = 57), I get these results on an iMac i7 4ghz : 5.5298409462 5.52825784683 0.631176948547 0.556658983231 while the first was more than 19ms using a random number – dspr Oct 16 '20 at 12:59
  • @JérômeRichard Yes, the timings are not accurate but the __relative__ timings between methods are, as both import et number generation statements appear in all the `timeit` calls. Or am I missing something ? – Bastien Broussard Oct 16 '20 at 13:06
  • 1
    @BastienBroussard Not really. On my machine `import random; n = random.randint(10, 99); x = n // 10 % 10` gives 5.23 while `import random; n = random.randint(10, 99); x = n // 10 % 10; y = n // 10 % 10` gives 5.54. We expect the second result to be twice the first. The variation of the random generation might be significant compared to the computation of `x`. As a result, you cannot be sure `n // 10 % 10` is faster than `n//10 if n>9 else n` statistically speaking because of the very slight gap between the two last timings. I advise to use the `setup` argument of `timeit`. – Jérôme Richard Oct 16 '20 at 13:19
  • How is "n // 10 % 10" faster than "n // 10"?? Is it the if condition? Because I know that all my integers are of length 2. – eligolf Oct 16 '20 at 15:23