2

I've been told that Python sets are faster then lists when it comes to membership testing.

Despite that, timeit shows that for a large amount of values lists are actually faster.

For smaller set with more repetitions the difference is smaller and even reversed, but still, no significant advantage to sets (and I guess performance issues are more important for very large sets of data, isn't it?)

How can that data be explained?

>>> import timeit
>>> # Few repetitions on a bigger set:
>>> timeit.timeit('10000 in set(range(10000000))', number=10)
9.265543753999737
>>> timeit.timeit('10000 in list(range(10000000))', number=10)
4.788996731000225
>>> # More repetitions on a smaller set:
>>> timeit.timeit('10000 in set(range(10000))', number=100000)
32.068307194000226
>>> timeit.timeit('10000 in list(range(10000))', number=100000)
32.45919990500079
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437

1 Answers1

5

What you've been told is correct, searching in a set is O(1) since members are stored using a hash table. Searching in an (unsorted) array is O(n).

The problem with your tests is that you're both creating the set/array and searching it in the same line. In this case, you're both testing the speed of inserting all the items, and then searching for a single entry.

Try something like this instead:

test_range = range(10000000)
test_set = set(test_range)
test_array = list(test_range)

timeit.timeit('10000 in test_set', number=10)
timeit.timeit('10000 in test_array', number=10)
Evan
  • 105
  • 6
  • 2
    searching is still O(N) for Python *lists*, sorted or not. The implementation of the `in` operator doesn't choose between binary search/linear search, it always does linear search – juanpa.arrivillaga Nov 28 '18 at 22:49
  • Thanks! I tried a slightly different code (because `timeit` doesn't accept anything outside the scope of it's arguments) and the results were of course drastically in favor of the set. I just don't like to take things as granted I guess. Going to read about hash tables now :) – senior_citizen_ Dec 03 '18 at 22:52