Partial matching phrases in python

Question

I want to find partial matches between 2 strings/phrases and measure them on a scale of [0,1]. I tried using SequenceMatcher for the same.

Please find the sample code below:

from difflib import SequenceMatcher

out1 = SequenceMatcher(lambda x:x == " ",'this is a private museum','temporary vice prez').ratio()
out2 = SequenceMatcher(lambda x:x == " ",'this is a private museum','museum').ratio()

Here, the score I get for out1 is 0.279 and out2 is 0.4. However, out1 is not a match semantically, although out2 makes sense. How to evaluate the strings at a word level ?

Expected output would be something like out1 = 0 and out2=0.4. Scoring should be based on word-level similarity.

Any alternate solution would be helpful.

Thanks in advance!

EDIT: Solved this using Cosine similarity as the measure by referring to the accepted solution by vpekar here: How to calculate cosine similarity given 2 sentence strings? - Python

Well, both strings to compare start with a `t` and have ` pr` in them. Without doing the math I'd say that should give at least some point. — Klaus D., Jan 16 '17 at 07:59
Agree with you @KlausD. But I'm looking for an alternate solution that'd match it at a word level. — Sailesh, Jan 16 '17 at 08:05
@NickilMaveli I've edited my question to add the expected output. I believe something on the lines of sentence similarity(taking into account word matching) would be useful. — Sailesh, Jan 16 '17 at 09:35
I was able to achieve it using Cosine similarity. SO reference: http://stackoverflow.com/questions/15173225/how-to-calculate-cosine-similarity-given-2-sentence-strings-python — Sailesh, Jan 16 '17 at 09:56
Good that you solved it using cosine similarity, you can have a look at Jaccard Index (https://en.wikipedia.org/wiki/Jaccard_index) too. It is pretty simple to implement and works many a times :) — Yavar, Jan 16 '17 at 10:29

Partial matching phrases in python

0 Answers0