By using wordnet text matching I realized that the wordnet can only match a single word to a single word. It cannot match a single word to a phrase.
As you can see, I has two lists.
list1=['fruit', 'world']
list2=[u'domain', u'creation Year', u'world Tournament Silver', u'relation', u'existence', u'id', u'publication',
u'third Commander', u'management Region', u'ra', u'Earthquake', u'final Publication Year', u'creation Christian Bishop',
u'Planet', u'management Position', u'Race', u'world', u'first Publication Year', u'main Domain',
u'golden Globe Award', u'ist', u'race', u'world Tournament Bronze', u'top Level Domain', u'lower Earth Orbit Payload']
the list2 consists both single word and phrase. such as relation, management position......
currently i use wordnet to find the similarity
list=[]
for word1 in list1:
# print word1
for word2 in list2:
# print word2
wordFromList1 = wordnet.synsets(word1)
wordFromList2 = wordnet.synsets(word2)
if wordFromList1 and wordFromList2:
s = wordFromList1[0].wup_similarity(wordFromList2[0])
w1= (wordFromList1[0].lemmas()[0].name())
w2=(wordFromList2[0].lemmas()[0].name())
similarity = (s, w1, w2)
print similarity
the result:
(0.125, u'fruit', u'sphere')
(0.16666666666666666, u'fruit', u'relation')
(0.14285714285714285, u'fruit', u'being')
(0.3157894736842105, u'fruit', u'Idaho')
(0.4444444444444444, u'fruit', u'publication')
(0.25, u'fruit', u'radium')
(0.25, u'fruit', u'earthquake')
(0.625, u'fruit', u'planet')
(0.125, u'fruit', u'race')
(0.6666666666666666, u'fruit', u'universe')
(0.125, u'fruit', u'race')
(0.15384615384615385, u'universe', u'sphere')
(0.2222222222222222, u'universe', u'relation')
(0.18181818181818182, u'universe', u'being')
(0.375, u'universe', u'Idaho')
(0.5333333333333333, u'universe', u'publication')
(0.3076923076923077, u'universe', u'radium')
(0.3076923076923077, u'universe', u'earthquake')
(0.7692307692307693, u'universe', u'planet')
(0.15384615384615385, u'universe', u'race')
(1.0, u'universe', u'universe')
(0.15384615384615385, u'universe', u'race')
the problem is the wordnet only compare the single word , it does not compare the similarity between a single word to a phrase in the list2.
such as 'world' VS 'world Tournament Silver'
'world' VS 'world Tournament Bronze'
'world' VS 'createion Year'
.......................
so how to solve this problem?