Why does set( ) make this code run so much faster?

Question

I wrote some code for Project Euler Problem 35:

#Project Euler: Problem 35

import time

start = time.time()

def sieve_erat(n):
    '''creates list of all primes < n'''
    x = range(2,n)
    b = 0
    while x[b] < int(n ** 0.5) + 1:
        x = filter(lambda y: y % x[b] != 0 or y == x[b], x)
        b += 1
    else:
        return x

def circularPrimes(n):
    '''returns # of circular primes below n'''
    count = 0
    primes = sieve_erat(n)
    b = set(primes)
    for prime in primes:
        inc = 0
        a = str(prime)
        while inc < len(a):
            if int(a) not in b:
                break
            a = a[-1] + a[0:len(a) - 1]
            inc += 1
        else:
            count += 1
    else:
        return count

print circularPrimes(1000000)
elapsed = (time.time() - start)
print "Found in %s seconds" % elapsed

I am wondering why this code (above) runs so much faster when I set b = set(primes) in the circularPrimes function. The running time for this code is about 8 seconds. Initially, I did not set b = set(primes) and my circularPrimes function was this:

def circularPrimes(n):
    '''returns # of circular primes below n'''
    count = 0
    primes = sieve_erat(n)
    for prime in primes:
        inc = 0
        a = str(prime)
        while inc < len(a):
            if int(a) not in primes:
                break
            a = a[-1] + a[0:len(a) - 1]
            inc += 1
        else:
            count += 1
    else:
        return count

My initial code (without b = set(primes)) ran so long that I didn't wait for it to finish. I am curious as to why there is such a large discrepancy in terms of running time between the two pieces of code as I do not believe that primes would have had any duplicates that would have made iterating through it take so much longer that iterating through set(primes). Maybe my idea of set( ) is wrong. Any help is welcome.

Checking for `contains` is constant time in set, whereas a worst case linear time in a list, which is what I'm assuming you are returning from the function `sieve_erat`. — Sukrit Kalra, Jul 23 '14 at 04:29
See [TimeComplexity python wiki page](https://wiki.python.org/moin/TimeComplexity). — falsetru, Jul 23 '14 at 04:34
Just as a side note you can speed things up by taking advantage of the fact that the primes list is sorted (`sieve_erat` generates it as such). — Eugen Constantin Dinca, Jul 23 '14 at 04:53
possible duplicate of [Why is converting a list to a set faster than using just list to compute a list difference?](http://stackoverflow.com/questions/25294897/why-is-converting-a-list-to-a-set-faster-than-using-just-list-to-compute-a-list) — cHao, Aug 16 '14 at 00:06

score 5 · Accepted Answer · answered Jul 23 '14 at 04:33

5

I believe the culprit here is if int(a) not in b:. Sets are implemented internally as hashtables, meaning that checking for membership is significantly less expensive than with a list (since you just need to check for collision).

You can check out the innards of sets here.

answered Jul 23 '14 at 04:33

jmduke

1,657
1
13
11

@hlove Yes, exactly as jmduke said. With a list, the test for membership scans (possibly whole) list, with set, it is operation with constant execution time, as set has hash working as sort of database index. – Jan Vlcinsky Jul 23 '14 at 07:57

Why does set( ) make this code run so much faster?

1 Answers1