5

I am wondering if there is a simple way to get synonyms of nouns in wordnet. It seems that synonyms of adjectives are quite easy to get.

for ss in wn.synsets('beautiful'):
    print(ss)
    for sim in ss.similar_tos():
        print('    {}'.format(sim))

I found the code above from another SO question and it works well for adjectives. But when my word is 'gasoline' or 'fire' the results are terrible. Ideally, I would get a list of words very similar to this site.

Something else I have tried that has worked with good results but extremely slow is this:

def syn(word, lch_threshold=2.26):
for net1 in wn.all_synsets():
    try:
        lch = net1.lch_similarity(wn.synset(word))
    except:
        continue
    # The value to compare the LCH to was found empirically.
    # (The value is very application dependent. Experiment!)
    if lch >= lch_threshold:
        yield (net1, lch)

for x in syn('gasoline.n.1'):
    print  x

Which was also found from another SO question. Is there an easier way to get synonyms of nouns like in the link provided above?

Ted Petrou
  • 59,042
  • 19
  • 131
  • 136

2 Answers2

3

Heres a hacky way of getting synonyms. I tried some thesaurus API's but wasn't getting exactly what I wanted.

def get_syns(old_words):
    new_words = dict()
    for word, score in old_words.iteritems():
       new_words[word] = score
       for syn in get_web_syns(word):
           new_words[syn] = 1
    return new_words

def get_web_syns(word):
    req = requests.get('http://www.thesaurus.com/browse/' + word)
    soup = BeautifulSoup(req.text, 'html.parser')
    all_syns = soup.find('div', {'class' : 'relevancy-list'})
    syns = []
    for ul in all_syns.findAll('ul'):
        for li in ul.findAll('span', {'class':'text'}):
            syns.append(li.text.split()[0])
    return syns

cold = {'icy':2, 'ice':1, 'snow':1}
get_syns(cold)

Which returns: {u'algific': 1, u'antarctic': 1, u'arctic': 1, u'biting': 1, u'bitter': 1, u'blizzard': 1, u'chill': 1, u'chilled': 1, u'chilling': 1, u'chilly': 1, u'chunk': 1, u'cold': 1, u'crystal': 1, u'cube': 1, u'diamonds': 1, u'dry': 1, u'floe': 1, u'freezing': 1, u'frigid': 1, u'frigorific': 1, u'frost-bound': 1, u'frosty': 1, u'frozen': 1, u'gelid': 1, u'glacial': 1, u'glacier': 1, u'glaring': 1, u'glaze': 1, u'hail': 1, u'hailstone': 1, 'ice': 1, u'iceberg': 1, u'iced': 1, u'icicle': 1, 'icy': 2, u'permafrost': 1, u'polar': 1, u'raw': 1, u'refrigerated': 1, u'rimy': 1, u'shivering': 1, u'shivery': 1, u'sleet': 1, u'sleeted': 1, u'smooth': 1, 'snow': 1, u'snowfall': 1}

A dict is used to assign scores to words for my specific application.

Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
1

Regardless of whether you deal with nouns, verbs or adjectives: you always get the synonyms of a synset by Synset.lemma(), e.g. wn.synsets('gasoline')[0].lemmas()

char bugs
  • 419
  • 2
  • 8
  • 1
    `wn.synsets('word')` by itself gives synonyms though they are not interesting for nouns. Adjectives still need the extra step provided in the first method above. Nouns yield nothing with that method and nothing with the lemmas you provided. The second method I provided above does work quite well for nouns (definition is important) but is extremely slow as it loops through the entire collection of synsets. It almost seems easier to scrape the web for what I need. – Ted Petrou Mar 18 '15 at 12:44
  • `wn-synsets('word')` don't returns synonyms! It returns different semantic concepts of a given word. For example `wn.synsets('cat')` returns `[Synset('cat.n.01'), Synset('guy.n.01'), ... Synset('caterpillar.n.02'), ... Synset('vomit.v.01')]`. – char bugs Mar 18 '15 at 20:40
  • According to http://stackoverflow.com/questions/19258652/how-to-get-synonyms-from-nltk-wordnet-python the synsets are synonyms. – Ted Petrou Mar 18 '15 at 21:46
  • It may be that __some__ synsets are more or less similar, but it doesn't mean that they are synonyms. Look at the cat example above, would you say that "vomit" and "caterpillar" are synonyms? – char bugs Mar 18 '15 at 22:50
  • Moreover, consider the example of the link you have given: `wn.synsets('small')` returns quit different concepts. You can check this by getting the definitions of the serveral synsets: `wn.synsets('small')[0].definition()` returns `u'the slender part of the back'`, whereas `wn.synsets('small')[2].definition()` shows `u'limited or below average in number or quantity or magnitude or extent'`. – char bugs Mar 18 '15 at 23:14