3

Looking for insight into how to optimize the speed of a Python script.

Challenge: "Which positive integers are equal to 23 times the sum of their digits" >> this is my functioning code:

def sum_of_digits(number):
    sum = 0;
    for c in number:
        sum += ord(c) - 48;
    
    return sum;

upper_limit = 10000000000;
count = 0;
for i in range(1, upper_limit, 1):
    if(23 * sum_of_digits(str(i)) == i):
        print(i);
        count += 1;

Which correctly spits out 207 as an answer (2+0+7 = 9. 9 x 23 = 207).

Problem is the time for execution scales as (n).

Whilst I can pre calculate the sum_of_digits - this wont save time and am looking for Python methods to speed to execution.

I know I can delve into number theory and "maths" it - but am interested in what I can squeeze out of Python.

Finally, I know other languages might be quicker, but I'm a high school physics teacher using this as a STEM example in a summer class.

Asocia
  • 5,935
  • 2
  • 21
  • 46
NimueSTEM
  • 193
  • 2
  • 13
  • Iteration in Python is pretty slow. Vectorization is my go-to approach, since it solves the iteration problem and makes it easier to off-load calculations to libraries like NumPy etc. (which are a lot faster than regular Python, usually because they are implemented in C). I don't know of many other well-working solutions to speeding up Python, other than re-structuring the code or leveraging native libraries. – Morten Jensen Jul 03 '20 at 09:15

3 Answers3

4

First of all, this is Python so you don't need to put ; at the end of statements. Since you are looking for numbers which are multiple of 23, you can gain some performance if you iterate over the numbers that are multiples of 23:

for i in range(0, upper_limit, 23):
    # 'i' will take values 0, 23, 46, ...

And your sum_of_digits function would perform better if you work on ints. Taken from here:

def sum_digits(n):
   r = 0
   while n:
       r, n = r + n % 10, n // 10
   return r

So the final code will be:

upper_limit = 10000000000
count = 0
for i in range(0, upper_limit, 23):
    if (23 * sum_digits(i) == i):
        print(i)
        count += 1

print(count)

However, this will still perform in O(n*m) because you have nested loops.

Instead of trying to find such numbers blindly, we can do a little bit math and break the loop.

For 1 digit numbers 23 * sum_digits(i) would be 23 * 9 = 207 at maximum.
For 2 digit numbers 23 * sum_digits(i) would be 23 * 18 = 414 at maximum.
For 3 digit numbers 23 * sum_digits(i) would be 23 * 27 = 621 at maximum.
For 4 digit numbers 23 * sum_digits(i) would be 23 * 36 = 828 at maximum.
For 5 digit numbers 23 * sum_digits(i) would be 23 * 45 = 1035 at maximum.
.
.
.

You can see the pattern here. We actually don't need to search for the numbers which have 4 digits or more. Because all the 4 digit numbers are greater than 828 and all the 5 digit numbers are greater than 1035 etc. So here is a better solution which breaks when we are sure that all the remaining numbers won't be valid.

def sum_digits(n):
   r = 0
   while n:
       r, n = r + n % 10, n // 10
   return r

upper_limit = 10000000000
count = 0
for i in range(0, upper_limit, 23):
    number_length = len(str(i))
    if number_length * 9 * 23 < i:
        break
    if 23 * sum_digits(i) == i:
        print(i)
        count += 1

print(count)
Asocia
  • 5,935
  • 2
  • 21
  • 46
  • 1
    Technically it's O(m*n), there are nested loops but they're working on different things. And the second one is short enough as to be constant. – Masklinn Jul 03 '20 at 09:25
  • Also for a (very minor) improvement you could pre-compute a table of, say, summing the digits from 0 to 1000. This way you can step by 1000 instead of 10 and cut down the amount of summing a bit. Sadly it doesn't cut down on the main cost factor (which is calling sum_digits a lot), only on the cost of sum_digits for larger numbers. – Masklinn Jul 03 '20 at 09:38
  • 2
    @Masklinn I edited my answer, thanks. And MrGPG, you can also check the edit I made. It will now terminate so much earlier. – Asocia Jul 03 '20 at 09:45
  • @Asocia - great description -- love this --- thank you. – NimueSTEM Jul 03 '20 at 11:54
  • 1
    shouldn't this work as well? `if number_length * 9 * 23 < i:` – gelonida Jul 03 '20 at 12:14
  • @gelonida Yeah, that would also work. I just wanted to play safe :) – Asocia Jul 03 '20 at 12:39
1

The code you posted runs in a loop- 1 calculation at a time. You can vectorize it by removing the loops, and thereby do multiple calculations at once. This will speed up the execution time. I suggest using numpy to achieve this. Some sample code given below, here I check where the range of values is divisible by 23 without a remainder:

import numpy as np
# Create range of values
test_arr = np.arange(10000)
# Find values where condition is true
is_divisible_23 = np.where(test_arr % 23 == 0)
# Print the result
print(is_divisible_23)

Returns:

(array([   0,   23,   46,   69,   92,  115,  138,  161,  184,  207,  230...]), dtype=int64)

You'll just need to write a digit sum function that can work on a numpy array. Additionally, you likely won't have enough memory to pull off the entire calculation at once (10000000000 values would need 74.5GiB of memory) but you could do it in chunks. Just generate a new range where you left off:

np.arange(10000, 100000)
Shan S
  • 658
  • 5
  • 18
0

Since you are working with multiple variables that could be typed as int and double I would say that Cython is a great option to take a look at. It is a module that lets you specify the type of a variable and then compile the script to c-code.

In this case I am certain that it would make an order of magnitude of improvment.

JakobVinkas
  • 1,003
  • 7
  • 23