3

I need to measure the similarity between two sentences. For example:

s1 = "she is good a dog "
s2 = "she is nice a heel"

I need to prove that "good" is similar to "nice". For nouns and verbs the measures of similarity by path is working like this pseudo code:

def get max :
for loop
(wn.synset ('dog ')).path_similarity(wn.synset ('animal'))

Result: .33, which is a high value, then these words are related and I can say it's similar. But for adverbs ("nice" and "good") the value .09 is low!

Any ideas?

Mona
  • 121
  • 2
  • 8
  • 1
    Is that supposed to be code?! – jonrsharpe Mar 15 '15 at 16:33
  • I think your question is "why are nice and good not being recognized as synonyms?". Perhaps you could rephrase it like that, and show us the real python code you are using that gives you the 0.09 number. – Darren Cook Mar 16 '15 at 08:31
  • The problem of finding semantic similarity between two sentences seems to be more complex than finding similarity between huge documents. https://www.hindawi.com/journals/tswj/2014/437162/. – pmuntima Apr 02 '17 at 02:52

1 Answers1

3

You can find the path_similarity for all the synsets of good then choose the max:

>>> from nltk.corpus import wordnet as wn
>>> n=wn.synsets('nice')
>>> g=wn.synsets('good')
>>> [i.path_similarity(n[0]) for i in g]
[0.0625, 0.06666666666666667, 0.07142857142857142, 0.09090909090909091, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

>>> max(i.path_similarity(n[0]) for i in g)
0.09090909090909091

Note that the synsets of a word contain many form of a word like verb,none,adj,... so you need to select the proper one!

Also as an another choice you can use wup_similarity:

>>> round(max(i.wup_similarity(n[0]) for i in g), 1)
0.4

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

Read more about Synsets http://www.nltk.org/howto/wordnet.html

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • In wordnet 3.1, nice#1 and good#6 are directly connected with "similar to". Does `wup_similarity` take that into account and consider it worth 0.4, whereas `path_similarity` does not use the similar-to relation? http://wordnetweb.princeton.edu/perl/webwn?o2=1&o0=1&o8=1&o1=1&o7=1&o5=1&o9=&o6=1&o3=1&o4=1&s=nice&i=3&h=01000000000#c – Darren Cook Mar 16 '15 at 08:37
  • Sadly, the similarity measures in NLTK API to wordnet is for single lexical items and not for a full sentence =( Using some sort of combination for the lexical similarity to form the sentence similarity isn't an easy task. – alvas Aug 18 '16 at 22:59
  • @alvas Indeed, this answer will give an approximate result based on similarity of words. – Mazdak Aug 18 '16 at 23:06