I'm comparing a list of strings words_to_lookup
to a list of strings word_list
and for every match I'll do something.
I wonder whether direct string comparison or checking existence in a set will be more time efficient.
String comparison is O(n)
for w in word_list:
for s in words_to_lookup:
if s == w:
# do something
Checking existence in a set is O(1) but getting the hash of the string (probably) takes O(n).
w_set = set(word_list)
for s in words_to_lookup:
if s in w_set:
# do something
So is string comparison an efficient approach? If the set approach is faster, how?
EDIT:
I was not thinking clearly when I first posted the question. Thank you for the comments. I find it hard to convert my real problem to a concise one suited for online discussions. Now I made the edit, the answer is obvious that the first approach is O(n^2) and the second approach is O(n).
My real question should have been this: Can hash tables really be O(1)?.
And the answer is:
The hash function however does not have to be O(m) - it can be O(1). Unlike a cryptographic hash, a hash function for use in a dictionary does not have to look at every bit in the input in order to calculate the hash. Implementations are free to look at only a fixed number of bits.