In Python, you can use the in
keyword for checking whether a string contains a substring. A naive implementation, but fast on average, would be something like
def in(small, large):
for i in range(len(large)):
if all(c1 == c2 for c1, c2 in zip(small, large[i:])):
return True
return False
The run time of this substring matching algorithm is quadratic in the worst case. However, when I try the adversarial example
In [10]: sub = 'a'*10000 + 'b'
In [11]: full = 'a'*1000000
In [12]: sub in full
Out[12]: False
I get the correct answer near instanteneously.
This suggests that under the hood, Python uses a faster algorithm like KMP to check for the presence of this substring. This strikes me as weird, because it seems that the extra computation time for KMP, even if good for adversarial examples like these, is almost never worth it in practice.
So what's going on with Python in
?