I am writing a function to determine the "similarity" between 2 sentences. I begin by simply using python's SequenceMatcher in difflib but I obtain poor results in sentences where the words were "swapped".
For example:
- The ball was hit by Carl
- Carl hit the ball
So I write the following function in order to solve this issue:
def similarity(a, b):
a = a.lower()
b = b.lower()
a_tokens = a.split(" ")
a_permutations = list(" ".join(word) for word in itertools.permutations(a_tokens))
result = 0
for a_permutation in a_permutations:
similarity = SequenceMatcher(lambda w: is_stop_word(w), a_permutation, b).ratio()
if similarity > result:
result = similarity
return result
The function works fine and gives better results but I am concerned it could take too long for big inputs. Any recommendation on how to improve it?