1

I need to create a table, containing relations between words (synsets) from any raw text using path_similarity method.

>>> from nltk.corpus import wordnet as wn
>>> sent = "I went to the bank to deposit money".split()
>>> wn.synsets('bank')
[Synset('bank.n.01'), Synset('depository_financial_institution.n.01'), Synset('bank.n.03'), Synset('bank.n.04'), Synset('bank.n.05'), Synset('bank.n.06'), Synset('bank.n.07'), Synset('savings_bank.n.02'), Synset('bank.n.09'), Synset('bank.n.10'), Synset('bank.v.01'), Synset('bank.v.02'), Synset('bank.v.03'), Synset('bank.v.04'), Synset('bank.v.05'), Synset('deposit.v.02'), Synset('bank.v.07'), Synset('trust.v.01')]

How can I get the correct synset for each word from the raw text?

I can get the lemmas and POS tags as such:

>>> from nltk import pos_tag
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('banks')
u'bank'
>>> pos_tag(['banks'])
[('banks', 'NNS')]

But how do I get the correct synset/sense number?

alvas
  • 115,346
  • 109
  • 446
  • 738
MisterMe
  • 149
  • 10
  • are you looking for a word sense disambiguation software? Have you tried `nltk.wsd.lesk`? https://github.com/nltk/nltk/blob/develop/nltk/wsd.py – alvas Jan 18 '15 at 23:56
  • Thanks a lot. It's the solution I've been looking for. – MisterMe Jan 19 '15 at 22:29
  • see also: https://github.com/alvations/pywsd (Disclaimer: I wrote them) – alvas Jan 19 '15 at 23:16
  • possible duplicate of [Word sense disambiguation in NLTK Python](http://stackoverflow.com/questions/3699810/word-sense-disambiguation-in-nltk-python) – alvas Jan 19 '15 at 23:23

0 Answers0