Your solution has the worst-case O(nm) time complexity, that is O(n^2) if n ~ m. You can easily reduce it to O(n log(n)) and even O(log(n)). Here is how.
Consider a list of words (your comps
attrubute) and a target (your b
)
words = ['abdc', 'abd', 'acb', 'abcabc', 'abc']
target = "abcd"
Observe, that by sorting the list of words in lexicographical order, you get a list of prefixes
prefixes = ['abc', 'abcabc', 'abd', 'abdc', 'acb']
It is degenerate, because prefixes[0]
is a prefix of prefixes[1]
, hence everything that starts with prefixes[1]
starts with prefixes[0]
just as well. This is a bit problematic. Let's see why. Let's use the fast (binary) search to find the proper place of the target in the prefix
list.
import bisect
bisect.bisect(prefixes, target) # -> 2
This is because the target
and prefixes[1]
share a prefix, but target[3] > prefixes[1][3]
, hence lexicographically it should go after. Hence, if there is a prefix of the target
in the prefixes
, it should be to the left of index 2
. Obviously, the target
doesn't start with prefixes[1]
hence in the worst case we would have to search all the way to the left to find whether there is a prefix. Now observe, that if we transform these prefixes
into a nondegenerate list, the only possible prefix of a target will always be to the left of the position returned by bisect.bisect
. Let's reduce the list of prefixes and write a helper function that will check whether there is a prefix of a target.
from functools import reduce
def minimize_prefixes(prefixes):
"""
Note! `prefixes` must be sorted lexicographically !
"""
def accum_prefs(prefixes, prefix):
if not prefix.startswith(prefixes[-1]):
return prefixes.append(prefix) or prefixes
return prefixes
prefs_iter = iter(prefixes)
return reduce(accum_prefs, prefs_iter, [next(prefs_iter)]) if prefixes else []
def hasprefix(minimized_prefixes, target):
position = bisect.bisect(minimized_prefixes, target)
return target.startswith(minimized_prefixes[position-1]) if position else False
Now let's see
min_prefixes = minimize_prefixes(prefixes)
print(min_prefixes) # -> ['abc', 'abd', 'acb']
hasprefix(min_prefixes, target) # -> True
Let's make a test that must fail:
min_prefs_fail = ["abcde"]
hasprefix(min_prefs_fail, target) # -> False
This way you get O(n log(n)) search, which is asymptotically faster than your O(n^2) solution. Note! You can (and you really should) store the minimize_prefixes(sorted(comps))
prefix set as an attribute in your object, making any prefix search O(log (n)), which is even more faster than what you have now.