Are sets really faster than lists?

Question

I've been told that Python sets are faster then lists when it comes to membership testing.

Despite that, timeit shows that for a large amount of values lists are actually faster.

For smaller set with more repetitions the difference is smaller and even reversed, but still, no significant advantage to sets (and I guess performance issues are more important for very large sets of data, isn't it?)

How can that data be explained?

>>> import timeit
>>> # Few repetitions on a bigger set:
>>> timeit.timeit('10000 in set(range(10000000))', number=10)
9.265543753999737
>>> timeit.timeit('10000 in list(range(10000000))', number=10)
4.788996731000225
>>> # More repetitions on a smaller set:
>>> timeit.timeit('10000 in set(range(10000))', number=100000)
32.068307194000226
>>> timeit.timeit('10000 in list(range(10000))', number=100000)
32.45919990500079

`set`s are faster *after* you've built them- take that out of your timing — Chris_Rands, Nov 28 '18 at 22:45
You are including the creation time, `set(range(10000000))` and `list(range(10000000))`, which is what dominates the timing, you need to isolate the membership testing to see what is really better — juanpa.arrivillaga, Nov 28 '18 at 22:46
if you're only doing a single membership test, building a `set` may not be the way forwards, but if you do many tests, you want to build a `set` first — Chris_Rands, Nov 28 '18 at 22:47
@Chris_Rands well in this case, membership testing for `range` is the fastest regardless! — juanpa.arrivillaga, Nov 28 '18 at 22:48
@juanpa.arrivillaga for this contrived example yes (in Python 3) — Chris_Rands, Nov 28 '18 at 22:48

score 5 · Accepted Answer · answered Nov 28 '18 at 22:48

5

What you've been told is correct, searching in a set is O(1) since members are stored using a hash table. Searching in an (unsorted) array is O(n).

The problem with your tests is that you're both creating the set/array and searching it in the same line. In this case, you're both testing the speed of inserting all the items, and then searching for a single entry.

Try something like this instead:

test_range = range(10000000)
test_set = set(test_range)
test_array = list(test_range)

timeit.timeit('10000 in test_set', number=10)
timeit.timeit('10000 in test_array', number=10)

answered Nov 28 '18 at 22:48

Evan

105
6

2

searching is still O(N) for Python *lists*, sorted or not. The implementation of the `in` operator doesn't choose between binary search/linear search, it always does linear search – juanpa.arrivillaga Nov 28 '18 at 22:49
Thanks! I tried a slightly different code (because `timeit` doesn't accept anything outside the scope of it's arguments) and the results were of course drastically in favor of the set. I just don't like to take things as granted I guess. Going to read about hash tables now :) – senior_citizen_ Dec 03 '18 at 22:52

Are sets really faster than lists?

1 Answers1