I was under the impression that startswith
has to be faster than in
for the simple reason that in
has to do more checks (allows for the word being looked for to be anywhere in the string). But I had my doubts so I decided to timeit
. The code for the timings is given below and as you will probably notice I haven't done much timing; the code is rather simple.
import timeit
setup1='''
def in_test(sent, word):
if word in sent:
return True
else:
return False
'''
setup2='''
def startswith_test(sent, word):
if sent.startswith(word):
return True
else:
return False
'''
print(timeit.timeit('in_test("this is a standard sentence", "this")', setup=setup1))
print(timeit.timeit('startswith_test("this is a standard sentence", "this")', setup=setup2))
Results:
>> in: 0.11912814951705597
>> startswith: 0.22812353561129417
So startswith
is twice as slow!.. I find this behavior very puzzling given what I said further above. Am I doing something wrong with timing the two or is in
indeed faster? If so, why?
Note that the results are very similar even when they both return False
(in this case, in
would have to actually traverse the whole sentece in case it simply short-circuited before):
print(timeit.timeit('in_test("another standard sentence, would be that", "this")', setup=setup1))
print(timeit.timeit('startswith_test("another standard sentence, would be that", "this")', setup=setup2))
>> in: 0.12854891578786237
>> startswith: 0.2233201940338861
If I had to implement the two functions from scratch it would look something like this (pseudocode):
startswith
: start comparing the letters of word to the letters of sentence one by one until a) word gets depleted (return True) or b) check returns False (return False)
in
: call startswith
for every position where the initial letter of word can be found in sentence.
I just don't get it..
Just to make it clear, in
and startswith
are not equivallent; I am just talking about the case where the word one is trying to find has to be the first in a string.