Return the index which gives the maximum score, where the maximum score is the strings which have the most matching characters.
def best_overlap(a, b):
return max([(score(a[offset:], b), offset) for offset in xrange(len(a))], key=lambda x: x[0])[1]
def score(a, b):
return sum([a[i] == b[i] for i in xrange(len(a))])
>>> best_overlap(a, b)
5
>>> a + '-' * best_overlap(a, b); '-' * best_overlap(a, b) + b
'aaaaabbbbbccccc-----'
'-----bbbebcccccddddd'
Or, equivalently:
def best_match(a, b):
max = 0
max_score = 0
for offset in xrange(len(a)):
val = score(a[offset:], b)
if val > max_score:
max_score = val
max = offset
return max
There is room for optimizations such as:
Early exit for no matching characters
Early exit when maximum possible match found