2

Say that I have a word A and a word B, where I use B as a hint which implies the meaning of A. For instance, A = bass, B = music, given this word pair, as human beings we can immediately know what does the word A mean.

I know that there are lots of algorithms that work for sentences. I'm wondering if there has been algorithms developed for doing WSD only for a pair of words.

Kelvin Lee
  • 385
  • 3
  • 15

1 Answers1

10

Word Sense Disambiguation (WSD) is the task in disambiguating a word given a context sentence/document. In the case, of a two token phrase, the context is basically the other token.

You can try out different WSD software and here's a list: Anyone know of some good Word Sense Disambiguation software?

I'll give you an example using pywsd (https://github.com/alvations/pywsd):

$ wget https://github.com/alvations/pywsd/archive/master.zip
$ unzip master.zip
$ cd pywsd-master
$ python
Python 2.7.5+ (default, Feb 27 2014, 19:37:08) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lesk import simple_lesk
# disambiguating the word 'bass' given the context 'bass music'
>>> simple_lesk('bass music', 'bass') 
Synset('bass.n.07')
>>> disambiguated = simple_lesk('bass music', 'bass')
>>> disambiguated.definition
<bound method Synset.definition of Synset('bass.n.07')>
>>> disambiguated.definition()
u'the member with the lowest range of a family of musical instruments

Alternatively, you can use a new module in NLTK (https://github.com/nltk/nltk/blob/develop/nltk/wsd.py), given that you have the bleeding edge version:

from nltk.wsd import lesk
disambiguated = lesk(context_sentence="bass music", ambiguous_word="bass")
print disambiguated.definition()

(Disclaimer: I wrote both pywsd and the lesk module in NLTK)

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
  • Thank you! I saw many algorithms in the list! Which algorithms do you think would be the most suitable one for my task? By most suitable, I meant, maybe higher accuracy and lower algorithmic/space complexity. – Kelvin Lee Jun 21 '14 at 02:24
  • i don't have much. I only have 2 main algorithm, i.e. lesk and maximizing similarity. the rest are in progress. I suggest you use any of the lesk algorithm as only a baseline. i'll try to finish up the rest of the code when i'm free. – alvas Jun 21 '14 at 07:16
  • I suggest adapted_lesk as the "strongest" for lesk variant and for similarity (it takes longer) but path similarity works well. – alvas Jun 21 '14 at 07:17
  • [Here](http://arxiv.org/pdf/1204.1406.pdf) is a research paper you might find interesting. – Explorer Jun 30 '14 at 10:07