I need to store a large list of numbers in memory. I will then need to check for membership. Arrays are better than lists for memory efficiency. Sets are better than lists for membership checking. I need both! So my questions are:
1) How much more memory efficient are arrays than sets? (For the converse, see my results below). 2) Is there a data structure which strikes a better balance between sets and arrays? Something like a set with a signed integer type? Or some numpy construct?
I checked out the membership timing difference with the script below. (I know timeit is better but the variance is low enough for time to be fine):
import array
import time
class TimerContext:
def __enter__(self):
self.t0 = time.time()
def __exit__(self, *args, **kwargs):
print(time.time()-self.t0)
SIZE = 1000000
l = list([i for i in range(SIZE)])
a = array.array('I', l)
s = set(l)
print(type(l))
print(type(a))
print(type(s))
with TimerContext():
x = 99999 in l
with TimerContext():
x = 99999 in a
with TimerContext():
x = 99999 in s
Results:
<class 'list'>
<class 'array.array'>
<class 'set'>
0.0012176036834716797
0.0024595260620117188
1.430511474609375e-06
So sets are a LOT faster for membership checking (please note the scientific notation). So if their memory footprint isn't that different to the array, I'll prefer to use a set. But I don't know how to check the memory footprint.
I should also add that there are a lot of questions comparing sets and lists. But I didn't see any good answers comparing arrays and sets.