3

By using wordnet text matching I realized that the wordnet can only match a single word to a single word. It cannot match a single word to a phrase.

As you can see, I has two lists.

list1=['fruit', 'world']
list2=[u'domain', u'creation Year', u'world Tournament Silver', u'relation', u'existence', u'id', u'publication',
              u'third Commander', u'management Region', u'ra', u'Earthquake', u'final Publication Year', u'creation Christian Bishop',
              u'Planet', u'management Position', u'Race', u'world', u'first Publication Year', u'main Domain',
              u'golden Globe Award', u'ist', u'race', u'world Tournament Bronze', u'top Level Domain', u'lower Earth Orbit Payload']

the list2 consists both single word and phrase. such as relation, management position......

currently i use wordnet to find the similarity

list=[]
for word1 in list1:
        # print word1
        for word2 in list2:
            # print word2

            wordFromList1 = wordnet.synsets(word1)
            wordFromList2 = wordnet.synsets(word2)
            if wordFromList1 and wordFromList2:
                s = wordFromList1[0].wup_similarity(wordFromList2[0])
                w1= (wordFromList1[0].lemmas()[0].name())
                w2=(wordFromList2[0].lemmas()[0].name())
                similarity = (s, w1, w2)
                print similarity

the result:

(0.125, u'fruit', u'sphere')
(0.16666666666666666, u'fruit', u'relation')
(0.14285714285714285, u'fruit', u'being')
(0.3157894736842105, u'fruit', u'Idaho')
(0.4444444444444444, u'fruit', u'publication')
(0.25, u'fruit', u'radium')
(0.25, u'fruit', u'earthquake')
(0.625, u'fruit', u'planet')
(0.125, u'fruit', u'race')
(0.6666666666666666, u'fruit', u'universe')
(0.125, u'fruit', u'race')
(0.15384615384615385, u'universe', u'sphere')
(0.2222222222222222, u'universe', u'relation')
(0.18181818181818182, u'universe', u'being')
(0.375, u'universe', u'Idaho')
(0.5333333333333333, u'universe', u'publication')
(0.3076923076923077, u'universe', u'radium')
(0.3076923076923077, u'universe', u'earthquake')
(0.7692307692307693, u'universe', u'planet')
(0.15384615384615385, u'universe', u'race')
(1.0, u'universe', u'universe')
(0.15384615384615385, u'universe', u'race')

the problem is the wordnet only compare the single word , it does not compare the similarity between a single word to a phrase in the list2.

such as 'world'  VS 'world Tournament Silver'
        'world'  VS 'world Tournament Bronze'
        'world'  VS 'createion Year'
.......................

so how to solve this problem?

bob90937
  • 553
  • 1
  • 5
  • 18
  • It's really slow but I think it's something you're looking for https://github.com/alvations/pywsd/blob/master/pywsd/similarity.py – alvas Oct 21 '16 at 08:33
  • There's also quite a lot of similar questions: https://www.google.com.sg/search?client=safari&rls=en&q=similarity+between+2+words+stackoverflow – alvas Oct 21 '16 at 08:35
  • take a look of this post, it may be helpful: http://stackoverflow.com/questions/19348973/all-synonyms-for-word-in-python – Enix Oct 21 '16 at 08:35
  • I would like to know what could be the other methods for text matching ? not asking for how to use wordnet to do the text matching. so it is not duplicate – bob90937 Oct 21 '16 at 08:39
  • The answer is still pretty much endless possibilities =) – alvas Oct 21 '16 at 09:14
  • 1
    What is the goal of the "matching" you are interested in? What kind of similarity? "Semantic" still leaves a lot of options. What do these word lists represent? Are you trying to group documents, or words? Not only are there too many answers, the question itself is so vague as to be nearly meaningless. Please clarify so you can get some useful answers. – alexis Oct 21 '16 at 15:55

0 Answers0