5

What represents the state-of-the-art in Word Sense Disambiguation (WSD) software? What metrics determine the state-of-the-art, and what toolkits / open source packages are available?

alvas
  • 115,346
  • 109
  • 446
  • 738
Blodstone
  • 325
  • 2
  • 8
  • i've tried that already, just want to know other softwares that are available in the net. – Blodstone Jan 06 '11 at 12:47
  • State-of-the-art WSD has always been changing every SemEval cycle (i.e. every 3 years). That is because the evaluation criteria has been changing due to the availability of new machine learning technology and sense-annoation/related resources. For a thorough overview of the evaluation and WSD science developed over the years, i would recommend the SemEval wikipedia page or the SemEval portal http://aclweb.org/aclwiki/index.php?title=SemEval_Portal – alvas Sep 05 '12 at 04:06
  • Define "good" :-) If a toolkit approach is acceptable, then the NLTK toolkit for Python is worth looking at. Open source and there are a couple of good books, including one from O'Reilly which has been open published online. Intended for teaching, so typically each supported operation has multiple implemented algorithms, and the books have a very practical feel to them. – winwaed Jan 11 '11 at 03:41

1 Answers1

12

My list are not exhaustive but surely Googling for more will be better for your purposes.

For softwares here's a short list, remember to CITE the relevant sources!!!

GWSD: Unsupervised Graph-based Word Sense Disambiguation http://lit.csci.unt.edu/~rada/downloads/GWSD/GWSD.1.0.tar.gz

SenseLearner: All-Words Word Sense Disambiguation Tool http://lit.csci.unt.edu/~rada/downloads/senselearner/SenseLearner2.0.tar.gz

KYOTO UKB graph-based WSD http://ixa2.si.ehu.es/ukb/

pyWSD: Python Implementation of Simple WSD algorithms https://github.com/alvations/pywsd


WSD tasks are sort of also dependent of the data source so here's a few, remember to CITE them too!!!

Open Mind Word Expert Sense Tagged Data http://teach-computers.org/word-expert.html

TWA Sense Tagged Data http://lit.csci.unt.edu/~rada/downloads/TWA/TWA.tar.gz

SemCor http://lit.csci.unt.edu/~rada/downloads/semcor/semcor1.6.tar.gz


Lastly, WSD tasks are dependent on some preprocessing and if you're looking into state-of-the-art crosslingual WSD, then you should try to look out for word level aligners like

  • MOSES
  • MGIZA++
  • GIZA++
  • BerkeleyAligner

Also, look at previous Senseval/SemEval pages to look for what has already been done, and what are the trends that future tasks are moving towards. http://en.wikipedia.org/wiki/SemEval

alvas
  • 115,346
  • 109
  • 446
  • 738
  • Hi , i need to use wordnet based WSD alongside java . Can u suggest a jar file or any package that performs this function – CTsiddharth Feb 16 '12 at 07:01
  • @2er0 Any success with your project? – zanbri Aug 08 '12 at 14:23
  • Crosslingual WSD was indeed a hard task, i ranked last on the game @ SemEval-2013, but my cosine system was super 'resource-lean` and i think given enough data, it should scale properly to reach close to at least the 3rd/4th place. – alvas Dec 30 '13 at 15:29