Python Efficiency / Optimization Project Euler #5 example

Question

I wrote this solution to Project Euler #5.

import time
start_time = time.time()

def ProjectEulerFive (m = 20):

    a = m
    start = 2
    while (m % start) == 0:
        start += 1

    b = start
    while b < m:
        if ( a % b) != 0:
           a += m
           b = start
           continue
        else:
            b += 1
    return a

import sys

if (len(sys.argv)) > 2:
    print "error: this function takes a max of 1 argument"
elif (len(sys.argv)) == 2:
    print ProjectEulerFive(int(sys.argv[1]))
else:                          
    print ProjectEulerFive();

print "took " + str(time.time() - start_time ) + " seconds"

Takes my system about 8.5 seconds.

Then I decided to compare with other peoples solutions. I found this Project Euler 5 in Python - How can I optimize my solution?.

I hadn't thought of unique prime factorization.

But anyways, one supposedly optimized non-prime factorization based solution posted there:

import time
start_time = time.time()

check_list = [11, 13, 14, 16, 17, 18, 19, 20]

def find_solution(step):
    for num in xrange(step, 999999999, step):
        if all(num % n == 0 for n in check_list):
            return num
    return None

if __name__ == '__main__':
    solution = find_solution(20)
    if solution is None:
        print "No answer found"
    else:
        print "found an answer:", solution

    print "took " + str(time.time() - start_time ) + " seconds"

Takes my system about 37 seconds

My code is about 4 times faster even though I unnecessarily check for divisibility of 3,4,5,6,7,8,9,10, and 12.

I'm new to python, and having trouble seeing where the inefficiency is coming from.

EDIT:

I did another test.

import time
start_time = time.time()

def ProjectEulerFive (m = 20):
    ls = [11, 13, 14, 15, 16, 17, 18, 19]
    a = m
    i = 0
    while i < len(ls):
        if ( a % ls[i]) != 0:
            a += m
            i = 0
            continue
        else:
            i += 1
    return a

print ProjectEulerFive();                           
print "took " + str(time.time() - start_time ) + " seconds"

Takes my system 6 seconds, but this:

import time
start_time = time.time()

def ProjectEulerFive (m = 20):

    a = m
    start = 11

    b = start
    while b < m:
        if ( a % b) != 0:
           a += m
           b = start
           continue
        else:
            b += 1
    return a

print ProjectEulerFive()
print "took " + str(time.time() - start_time ) + " seconds"

Takes about 3.7 seconds

Maybe an interesting question on an academic level, but if you were writing a real application, you would either be happy now that the problem is solved, or, in that order 1. try a faster algorithm, 2. use a library that does the performance critical stuff already, 3. run the code with pypy, 4. rewrite it in C (possibly as Python extension via swig or cython). But micro-optimizing interpreted code... nope ;-) — maxy, Jul 15 '12 at 17:24

score 6 · Accepted Answer · answered Jul 15 '12 at 14:59

I see that although a faster solution has been posted, no one has actually answered the question. It's a rather difficult one to answer, in fact! The fundamental explanation is that function calls are relatively expensive. In order to make this conclusion persuasive, though, I'll have to dig fairly deeply into Python internals. Prepare yourself!

First of all, I'm going to disassemble (your third version of) ProjectEulerFive and find_solution from the "optimized" solution, using dis.dis. There's a lot here, but a quick scan is all that's required to confirm that your code calls no functions at all:

>>> dis.dis(ProjectEulerFive)
  2           0 LOAD_FAST                0 (m)
              3 STORE_FAST               1 (a)

  3           6 LOAD_CONST               1 (11)
              9 STORE_FAST               2 (start)

  4          12 LOAD_FAST                2 (start)
             15 STORE_FAST               3 (b)

  5          18 SETUP_LOOP              64 (to 85)
        >>   21 LOAD_FAST                3 (b)
             24 LOAD_FAST                0 (m)
             27 COMPARE_OP               0 (<)
             30 POP_JUMP_IF_FALSE       84

  6          33 LOAD_FAST                1 (a)
             36 LOAD_FAST                3 (b)
             39 BINARY_MODULO       
             40 LOAD_CONST               2 (0)
             43 COMPARE_OP               3 (!=)
             46 POP_JUMP_IF_FALSE       71

  7          49 LOAD_FAST                1 (a)
             52 LOAD_FAST                0 (m)
             55 INPLACE_ADD         
             56 STORE_FAST               1 (a)

  8          59 LOAD_FAST                2 (start)
             62 STORE_FAST               3 (b)

  9          65 JUMP_ABSOLUTE           21
             68 JUMP_ABSOLUTE           21

 11     >>   71 LOAD_FAST                3 (b)
             74 LOAD_CONST               3 (1)
             77 INPLACE_ADD         
             78 STORE_FAST               3 (b)
             81 JUMP_ABSOLUTE           21
        >>   84 POP_BLOCK           

 12     >>   85 LOAD_FAST                1 (a)
             88 RETURN_VALUE

Now let's look at find_solution:

>>> dis.dis(find_solution)
  2           0 SETUP_LOOP              58 (to 61)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_FAST                0 (step)
              9 LOAD_CONST               1 (999999999)
             12 LOAD_FAST                0 (step)
             15 CALL_FUNCTION            3
             18 GET_ITER            
        >>   19 FOR_ITER                38 (to 60)
             22 STORE_DEREF              0 (num)

  3          25 LOAD_GLOBAL              1 (all)
             28 LOAD_CLOSURE             0 (num)
             31 BUILD_TUPLE              1
             34 LOAD_CONST               2 (<code object <genexpr> at 
                                            0x10027eeb0, file "<stdin>", 
                                            line 3>)
             37 MAKE_CLOSURE             0
             40 LOAD_GLOBAL              2 (check_list)
             43 GET_ITER            
             44 CALL_FUNCTION            1
             47 CALL_FUNCTION            1
             50 POP_JUMP_IF_FALSE       19

  4          53 LOAD_DEREF               0 (num)
             56 RETURN_VALUE        
             57 JUMP_ABSOLUTE           19
        >>   60 POP_BLOCK           

  5     >>   61 LOAD_CONST               0 (None)
             64 RETURN_VALUE

Immediately it becomes clear that (a) this code is much less complex, but (b) it also calls three different functions. The first one is simply a single call to xrange, but the other two calls appear inside the outermost for loop. The first call is the call to all; the second, I suspect, is the generator expression's next method being called. But it doesn't really matter what the functions are; what matters is that they are called inside the loop.

Now, you might think "What's the big deal?" here. It's just a function call; a few nanoseconds here or there -- right? But in fact, those nanoseconds add up. Since the outermost loop proceeds through a total of 232792560 / 20 == 11639628 loops, any overhead gets multiplied by more than eleven million. A quick timing using the %timeit command in ipython shows that a function call -- all by itself -- costs about 120 nanoseconds on my machine:

>>> def no_call():
...     pass
... 
>>> def calling():
...     no_call()
...     
>>> %timeit no_call()
10000000 loops, best of 3: 107 ns per loop
>>> %timeit calling()
1000000 loops, best of 3: 235 ns per loop

So for every function call that appears inside the while loop, that's 120 nanoseconds * 11000000 = 1.32 seconds more time spent. And if I'm right that the second function call is a call to the generator expression's next method, then that function is called even more times, once for every iteration through the genex -- probably 3-4 times per loop on average.

Now to test this hypothesis. If function calls are the problem, then eliminating function calls is the solution. Let's see...

def find_solution(step):
    for num in xrange(step, 999999999, step):
        for n in check_list:
            if num % n != 0:
                break
        else:
            return num
    return None

Here's a version of find_solution that does almost exactly what the original does using for/else syntax. The only function call is the outer one, to xrange, which shouldn't cause any problems. Now, when I timed the original version, it took 11 seconds:

found an answer: 232792560
took 11.2349967957 seconds

Let's see what this new, improved version manages:

found an answer: 232792560
took 2.12648200989 seconds

That's a hair faster than the performance of your fastest version of ProjectEulerFive on my machine:

232792560
took 2.40848493576 seconds

And everything makes sense again.

@MVTCplusplus, see also [here](http://stackoverflow.com/a/11493585/577088) for what I believe is the fastest approach to this problem. — senderle, Jul 15 '12 at 16:25

score 5 · Answer 2 · answered Jul 14 '12 at 21:48

5

This ought to take about no time:

def gcd(a, b):
    if (b == 0): return a
    else: return gcd(b, a%b)

def lcm(a, b):
    return abs(a*b) / gcd(a, b)

def euler5(n):
    if (n == 1): return 1
    else: return lcm(n, euler5(n-1))

print euler5(20)

answered Jul 14 '12 at 21:48

user448810

17,381
4
34
59

Perfect solution. Takes 0.000147819519043 seconds on my system. – MVTC Jul 14 '12 at 22:04
2

A way to optimize further is to turn all of your recursive functions into iterative ones. This is because Python doesn't do tail call optimization and you can eliminate the function call overhead by using iteration. – Joel Cornett Jul 14 '12 at 22:07
1

No need to write your own `gcd`; there's one in the `fractions` module. [Admittedly it's never easy to know where to draw the line in terms of what you should allow yourself access to for Euler problems..] – DSM Jul 14 '12 at 22:18
@JoelCornett: Functions that are defined recursively are most clearly written recursively. Lack of tail call optimization is one of the reasons I prefer Scheme to Python. – user448810 Jul 14 '12 at 22:41
@DSM: It's simpler to write a function like this than to look up in the manual where to import it. – user448810 Jul 14 '12 at 22:42
@user448810: You're right, but my point was that in the interests of optimization, you go with the iterative solution. My `timeit` tests show a roughly 30% increase in speed with an iterative solution. Additionally, an iterative solution is not that unclear. Consider this: `def gcd(a, b):\n\twhile b:\n\t\ta, b = b, a % b\n\treturn a` – Joel Cornett Jul 14 '12 at 22:52
2

@JoelCornett: These functions are simple enough that it doesn't matter. But look sometime at avl trees in [recursive Scheme](http://programmingpraxis.com/2011/11/25/avl-trees/) and [iterative C](http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_avl.aspx) to see a situation where it does matter. I prefer the readable but slower recursive version; fortunately, I normally use a language that doesn't make me choose. – user448810 Jul 14 '12 at 23:10
Thanks, I'll definitely check it out. – Joel Cornett Jul 14 '12 at 23:32

score 4 · Answer 3 · answered Jul 14 '12 at 20:41

Not an answer to your question (hence the community wiki), but here is a useful decorator for timing functions:

from functools import wraps
import time

def print_time(f):
    @wraps(f)
    def wrapper(*args, **kwargs):
        t0 = time.time()
        result = f(*args, **kwargs)
        print "{0} took {1}s".format(f.__name__, time.time() - t0)
        return result
    return wrapper

Usage is as follows:

@print_time
def foo(x, y):
    time.sleep(1)
    return x + y

And in practice:

>>> foo(1, 2)
foo took 1.0s
3

score 0 · Answer 4 · answered Jul 30 '18 at 18:10

you can solve this problem using prime factors. Solves for n=20 in 0.0004s and for n=50 in 0.0011.

from math import sqrt
import time
num = int(input("Number: "))
start_time = time.clock()

def is_prime(n):
    if(n == 2 or n == 3):
        return True
    elif(n < 2 or n % 2 == 0):
        return False
    for i in range(3, int(sqrt(n))+1, 2):
        if(n % i == 0):
            return False
    return True

def decompose(n):
    if(n == 1 or n == 0):
        return [n]
    l = []
    residue = n
    while(residue != 1):
        for i in range(1, residue+1):
            if(residue%i==0 and is_prime(i)):
                l.append(i)
                residue //= i
                break
    return l

l = []
for i in range(1, num):
    d = decompose(i)
    for n in d:
        if(l.count(n) < d.count(n)):
            l += [n]*(d.count(n)-l.count(n))
result = 1
for i in l:
    result*=i
print(result)

print("time: ", time.clock()-start_time)

score 0 · Answer 5 · answered Aug 12 '19 at 16:17

This is my implementation

I understand the why but would appreciate enlightenment on the math behind it

If I write down all prime numbers that are not greater than the highest divisible number then substitute subset of factors that include the primes divisors that are smaller than my limit..

from functools import reduce

def divisible_by_all_up_to(limit):
    def is_prime(num):
        if num == 2 or num == 3:
            return True
        if num % 2 == 0 or num < 2:
            return False
        for i in range(3, num):
            if num % i == 0:
                return False
        return True
    primes = [i for i in range(limit) if is_prime(i) == True]
    mult = []
    for index, value in enumerate(primes):
        while value * value < limit:
            value = value * value
        mult += [value]
    return mult

ans = divisible_by_all_up_to(20)

resp = reduce(lambda x, y: x*y, ans)

Python Efficiency / Optimization Project Euler #5 example

5 Answers5