I was writing a simple Sieve of Eratosthenes in which one produces a list of all of primes up to some number N without performing any division or modular arithmetic. Abstractly, my implementation uses an array of N boolean values which all start out False and are eventually flipped to True over the course of the algorithm.
I wanted to know if this would be faster and/or use less memory if I implemented it as a list
of 0
and 1
, a list
of True
and False
, or a bytearray
of 0
and 1
.
Timing (python 2.7)
Using python 2.7, I found the following when using N = 10k and N = 30M:
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list(3000000)'
10 loops, best of 3: 1.42 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_byte(3000000)'
10 loops, best of 3: 1.23 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list2(3000000)'
10 loops, best of 3: 1.65 sec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list(10000)'
500 loops, best of 3: 3.59 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_byte(10000)'
500 loops, best of 3: 4.12 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list2(10000)'
500 loops, best of 3: 4.25 msec per loop
10k 3M
byte (01) 4.12 ms 1.23 s
list (01) 3.59 ms 1.42 s
list (TF) 4.25 ms 1.65 s
What surprises me is that for small values of N, the list
of integers was the best, and for large values of N, the bytearray
was the best. The list
of True and False was always slower.
Timing (python 3.3)
I also repeated the test in python 3.3:
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list(3000000)'
10 loops, best of 3: 2.05 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_byte(3000000)'
10 loops, best of 3: 1.76 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list2(3000000)'
10 loops, best of 3: 2.02 sec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list(10000)'
500 loops, best of 3: 5.19 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_byte(10000)'
500 loops, best of 3: 5.34 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list2(10000)'
500 loops, best of 3: 5.16 msec per loop
10k 3M
byte (01) 5.34 ms 1.76 s
list (01) 5.19 ms 2.05 s
list (TF) 5.16 ms 2.02 s
Here there was the same ordering with the list
being better for small N, and the bytearray
for large N, but the list
with True
and False
was not significantly different than the list
with 1
and 0
.
Memory Use
The memory use was exactly the same in python 2.7 and 3.3. I used sys.getsizeof
on the list
or bytearray
, which was the same size at the beginning and end of the algorithm.
>>> import sieve
>>> sieve.verbose = True
>>> x = sieve.soe_list(10000)
soe_list, 10000 numbers, size = 83120
>>> x = sieve.soe_byte(10000)
soe_byte, 10000 numbers, size = 10993
>>> x = sieve.soe_list2(10000)
soe_list2, 10000 numbers, size = 83120
>>> x = sieve.soe_list(3000000)
soe_list, 3000000 numbers, size = 26791776
>>> x = sieve.soe_byte(3000000)
soe_byte, 3000000 numbers, size = 3138289
>>> x = sieve.soe_list2(3000000)
soe_list2, 3000000 numbers, size = 26791776
10k 3M
byte (01) ~11k ~3.1M
list (01) ~83k ~27M
list (TF) ~83k ~27M
I was a bit surprised~ that the large bytearray
used more memory than the large list
given that the large bytearray
was faster.
EDIT: Oops, as pointed out in the comments, I read my own values wrong and interpreted 27M as 2.7M. The list is really much bigger.
The Question
Can anyone explain why this algorithm runs faster using a list
for small N, and faster using a bytearray
for large N?
Test code for reference
sieve.py:
import sys
if sys.version_info.major == 3:
xrange = range
verbose = False
def soe_byte(upper):
numbers = bytearray(0 for _ in xrange(0,upper+1))
if verbose:
print("soe_byte, {} numbers, size = {}".format(upper, sys.getsizeof(numbers)))
primes = []
cur = 2
while cur <= upper:
if numbers[cur] == 1:
cur += 1
continue
primes.append(cur)
for i in xrange(cur,upper+1,cur):
numbers[i] = 1
return primes
def soe_list(upper):
numbers = list(0 for _ in xrange(0,upper+1))
if verbose:
print("soe_list, {} numbers, size = {}".format(upper, sys.getsizeof(numbers)))
primes = []
cur = 2
while cur <= upper:
if numbers[cur] == 1:
cur += 1
continue
primes.append(cur)
for i in xrange(cur,upper+1,cur):
numbers[i] = 1
return primes
def soe_list2(upper):
numbers = list(False for _ in xrange(0,upper+1))
if verbose:
print("soe_list2, {} numbers, size = {}".format(upper, sys.getsizeof(numbers)))
primes = []
cur = 2
while cur <= upper:
if numbers[cur] == True:
cur += 1
continue
primes.append(cur)
for i in xrange(cur,upper+1,cur):
numbers[i] = True
return primes