Keep in mind that Python's substring search x in a
is already optimized pretty well for the general case (and coded in C, for CPython), so you're unlikely to beat it in general, especially with pure Python code.
However, if you have a more specialized case, you can do much better.
For example, if you have an arbitrary list of millions of strings b
that all need to be searched for within one giant static string a
that never changes, preprocessing a
can make a huge difference. (Note that this is the opposite of the usual case, where preprocessing the patterns is the key.)
On the other hand, if you expect matches to be unlikely, and you've got the whole b
list in advance, you can probably get some large gains by organizing b
in some way. For example, there's no point searching for "ABCD"
if "ABC"
already failed; if you need to search both "ABC"
and "ABD"
you can search for "AB"
first and then check whether it's followed by "C"
or "D"
so you don't have to repeat yourself; etc. (It might even be possible to merge all of b
into a single regular expression that's close enough to optimal… although with millions of elements, that probably isn't the answer.)
But it's hard to guess in advance, with no more information than you've given us, exactly what algorithm you want.
Wikipedia has a pretty good high-level overview of string searching algorithms. There's also a website devoted to pattern matching in general, which seems to be a bit out of date, but then I doubt you're going to turn out to need an algorithm invented in the past 3 years anyway.