1

I want to find partial matches between 2 strings/phrases and measure them on a scale of [0,1]. I tried using SequenceMatcher for the same.

Please find the sample code below:

from difflib import SequenceMatcher

out1 = SequenceMatcher(lambda x:x == " ",'this is a private museum','temporary vice prez').ratio()
out2 = SequenceMatcher(lambda x:x == " ",'this is a private museum','museum').ratio()

Here, the score I get for out1 is 0.279 and out2 is 0.4. However, out1 is not a match semantically, although out2 makes sense. How to evaluate the strings at a word level ?

Expected output would be something like out1 = 0 and out2=0.4. Scoring should be based on word-level similarity.

Any alternate solution would be helpful.

Thanks in advance!

EDIT: Solved this using Cosine similarity as the measure by referring to the accepted solution by vpekar here: How to calculate cosine similarity given 2 sentence strings? - Python

Community
  • 1
  • 1
Sailesh
  • 115
  • 2
  • 10
  • Well, both strings to compare start with a `t` and have ` pr` in them. Without doing the math I'd say that should give at least some point. – Klaus D. Jan 16 '17 at 07:59
  • Agree with you @KlausD. But I'm looking for an alternate solution that'd match it at a word level. – Sailesh Jan 16 '17 at 08:05
  • What would be your expected output for such a case? – Nickil Maveli Jan 16 '17 at 08:13
  • @NickilMaveli I've edited my question to add the expected output. I believe something on the lines of sentence similarity(taking into account word matching) would be useful. – Sailesh Jan 16 '17 at 09:35
  • I was able to achieve it using Cosine similarity. SO reference: http://stackoverflow.com/questions/15173225/how-to-calculate-cosine-similarity-given-2-sentence-strings-python – Sailesh Jan 16 '17 at 09:56
  • 1
    Good that you solved it using cosine similarity, you can have a look at Jaccard Index (https://en.wikipedia.org/wiki/Jaccard_index) too. It is pretty simple to implement and works many a times :) – Yavar Jan 16 '17 at 10:29
  • 1
    @Yavar thanks for the suggestion! This works as well :) – Sailesh Jan 17 '17 at 09:13

0 Answers0