I want to find partial matches between 2 strings/phrases and measure them on a scale of [0,1]
. I tried using SequenceMatcher
for the same.
Please find the sample code below:
from difflib import SequenceMatcher
out1 = SequenceMatcher(lambda x:x == " ",'this is a private museum','temporary vice prez').ratio()
out2 = SequenceMatcher(lambda x:x == " ",'this is a private museum','museum').ratio()
Here, the score I get for out1
is 0.279
and out2
is 0.4
.
However, out1
is not a match semantically, although out2
makes sense. How to evaluate the strings at a word level ?
Expected output would be something like out1 = 0
and out2=0.4
. Scoring should be based on word-level similarity.
Any alternate solution would be helpful.
Thanks in advance!
EDIT: Solved this using Cosine similarity as the measure by referring to the accepted solution by vpekar here: How to calculate cosine similarity given 2 sentence strings? - Python