How to get correct synsets from the raw text?

Asked Jan 18 '15 at 23:18

Active Jan 19 '15 at 23:21

Viewed 681 times

I need to create a table, containing relations between words (synsets) from any raw text using path_similarity method.

>>> from nltk.corpus import wordnet as wn
>>> sent = "I went to the bank to deposit money".split()
>>> wn.synsets('bank')
[Synset('bank.n.01'), Synset('depository_financial_institution.n.01'), Synset('bank.n.03'), Synset('bank.n.04'), Synset('bank.n.05'), Synset('bank.n.06'), Synset('bank.n.07'), Synset('savings_bank.n.02'), Synset('bank.n.09'), Synset('bank.n.10'), Synset('bank.v.01'), Synset('bank.v.02'), Synset('bank.v.03'), Synset('bank.v.04'), Synset('bank.v.05'), Synset('deposit.v.02'), Synset('bank.v.07'), Synset('trust.v.01')]

How can I get the correct synset for each word from the raw text?

I can get the lemmas and POS tags as such:

>>> from nltk import pos_tag
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('banks')
u'bank'
>>> pos_tag(['banks'])
[('banks', 'NNS')]

But how do I get the correct synset/sense number?

edited Jan 19 '15 at 23:21

alvas

115,346
109
446
738

asked Jan 18 '15 at 23:18

MisterMe

are you looking for a word sense disambiguation software? Have you tried `nltk.wsd.lesk`? https://github.com/nltk/nltk/blob/develop/nltk/wsd.py – alvas Jan 18 '15 at 23:56
Thanks a lot. It's the solution I've been looking for. – MisterMe Jan 19 '15 at 22:29
see also: https://github.com/alvations/pywsd (Disclaimer: I wrote them) – alvas Jan 19 '15 at 23:16
possible duplicate of [Word sense disambiguation in NLTK Python](http://stackoverflow.com/questions/3699810/word-sense-disambiguation-in-nltk-python) – alvas Jan 19 '15 at 23:23

How to get correct synsets from the raw text?

0 Answers0