0

I have a book on Python which says:

in is a very fast operation on sets:

stopwords_list = ["a", "an"] + hundreds_of_other_words + ["yet", "you"]
"zip" in stopwords_list # False, but have to check every element
stopwords_set = set(stopwords_list)
"zip" in stopwords_set # Very fast to check

I have two questions:

  1. Why is in faster on sets than on lists?
  2. If the in operator really is faster on sets, then why don't the makers of Python just rewrite the in method for lists to do x in set(list)? Why can't the idea in this book just be made part of the language?
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
s7eqx9f74nc4
  • 123
  • 5

1 Answers1

2

set uses hash tables to get an average O(1) lookup instead of O(n) lookup with lists, that's why it's way faster, provided the elements are hashable (read more: What makes sets faster than lists?).

Now converting a list into a set when in is called requires the full list to be parsed and converted to set, so it's even slower that searching in a list (even if the element is at the start of the list, the conversion is done on all the elements so there's no short-circuiting).

There's no magic. Converting a list into a set is useful, but only at init phase, and then the lookup is done in the set multiple times during the processing.

But in that case, creating a set directly is the best way.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219