Python 3 - String Distances for Sentences - Execution Time Improvement

Question

I am writing a function to determine the "similarity" between 2 sentences. I begin by simply using python's SequenceMatcher in difflib but I obtain poor results in sentences where the words were "swapped".

For example:

The ball was hit by Carl
Carl hit the ball

So I write the following function in order to solve this issue:

def similarity(a, b):
    a = a.lower()
    b = b.lower()

    a_tokens = a.split(" ")
    a_permutations = list(" ".join(word) for word in itertools.permutations(a_tokens))


    result = 0
    for a_permutation in a_permutations:
            similarity = SequenceMatcher(lambda w: is_stop_word(w), a_permutation, b).ratio()
            if similarity > result:
                result = similarity
    return result

The function works fine and gives better results but I am concerned it could take too long for big inputs. Any recommendation on how to improve it?

Your similarity rating is very vague - should the sentence *"The ball hit Carl"* be __closer__ to *"Carl hit the ball"* than *"The ball was hit by Carl"*, even though they mean opposite things? — Billy, Nov 15 '16 at 15:04
Are you looking to compare by meaning? SequenceMatcher cannot compare for similarity by the meaning of the two sentences, it treats them only as two sequences, without meaning. If you are, check this out: [link](http://stackoverflow.com/questions/8897593/similarity-between-two-text-documents) — Priyank, Nov 15 '16 at 15:12
I am not trying to make a semantic comparation. I just want a string distance function that is robust against words swaps. — NMO, Nov 15 '16 at 15:28
Try this answer? [taken from prev link](http://stackoverflow.com/a/8897648/5699807) — Priyank, Nov 15 '16 at 15:36
"string distance function" with "robust against word swaps" seems to be a bit of an oxymoron - they're competing concepts, if not completely orthogonal. As already suggested, you need to start with a precise definition of what metric you are trying to calculate. — twalberg, Nov 15 '16 at 16:02

Python 3 - String Distances for Sentences - Execution Time Improvement

0 Answers0