7

This is what I did in ipython (I'm using Python 3.6)

from PyDictionary import PyDictionary
dictionary = PyDictionary()
list = dictionary.synonym("life")

And I get the error:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/PyDictionary/utils.py:5: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 5 of the file /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/PyDictionary/utils.py. To get rid of this warning, pass the additional argument 'features="html5lib"' to the BeautifulSoup constructor.

  return BeautifulSoup(requests.get(url).text)
life has no Synonyms in the API

This happens for each word I've tried, am I doing something wrong? Is the issue that I need to add the argument 'features="html5lib"', and if it is, where is the BeautifulSoup constructor and how do I do this?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
user1045890
  • 379
  • 1
  • 5
  • 11

4 Answers4

2

It is an updated version of Saran Roy's answer:

import requests
from bs4 import BeautifulSoup

def synonyms(term):
    response = requests.get('https://www.thesaurus.com/browse/{}'.format(term))
    soup = BeautifulSoup(response.text, 'lxml')
    soup.find('section', {'class': 'css-17ofzyv e1ccqdb60'})
    return [span.text for span in soup.findAll('a', {'class': 'css-1kg1yv8 eh475bn0'})] # 'css-1gyuw4i eh475bn0' for less relevant synonyms

word = "Input Your Word Here!"
print(synonyms(word))
ofekcohen
  • 21
  • 4
1

The PyDictionary.synonym function tries to look up synonyms on thesaurus.com, but the code is out of date. It's looking for html structures that don't exist anymore. The following code will do basically the same thing:

import requests
from bs4 import BeautifulSoup

def synonyms(term):
    response = requests.get('http://www.thesaurus.com/browse/{}'.format(term))
    soup = BeautifulSoup(response.text, 'html')
    section = soup.find('section', {'class': 'synonyms-container'})
    return [span.text for span in section.findAll('span')]

You may want to add some error handling.

Nathan Vērzemnieks
  • 5,495
  • 1
  • 11
  • 23
  • Thanks for the effort on this @Nathan Vērzemnieks, but your code is not working either, at least for me, even after several attempts to modify and simplify it. Ultimately there is some problem with a unicode character coming from Thesaurus.com "UnicodeEncodeError: 'charmap' codec can't encode character '\ue903' in position 43906: character maps to " – Thom Ives Dec 06 '19 at 22:32
  • It worked when I wrote it but - it's been eight months! The page has changed. This is one of the challenges of web scraping. In this case it looks like they've obfuscated the css names, probably with the intent of making scraping harder. All I needed to do to make this work is replace `'synonyms-container'` with `'e1991neq0'` - but that's probably not a sustainable approach. You could also target the structure of the page in a different way - do `section = soup.findAll('ul')[5]` instead - but that's just as prone to breaking. – Nathan Vērzemnieks Dec 07 '19 at 22:41
1

updated version of ofekcohen'answer

def synonyms(term):
    response = requests.get('https://www.thesaurus.com/browse/{}'.format(term))
    soup = BeautifulSoup(response.text, 'html.parser')
    soup.find('section', {'class': 'css-191l5o0-ClassicContentCard e1qo4u830'})
    return [span.text for span in soup.findAll('a', {'class': 'css-1kg1yv8 eh475bn0'})] 
wlan_o
  • 11
  • 1
  • 1
    same answer is two posts above : https://stackoverflow.com/questions/52910297/pydictionary-word-has-no-synonyms-in-the-api/62394906#62394906 – Beso Nov 29 '21 at 05:39
  • @Beso It uses a different ```class``` since the one in [this answer](https://stackoverflow.com/a/62394906/12268505) is not functional anymore. – Jonathan Dec 20 '21 at 19:12
0

Try this:

import requests
from bs4 import BeautifulSoup

def synonyms(term):
    response = requests.get('https://www.thesaurus.com/browse/{}'.format(term))
    soup = BeautifulSoup(response.text, 'lxml')
    soup.find('section', {'class': 'synonyms-container'})
    return [span.text for span in soup.findAll('a', {'class': 'css-18rr30y'})] # class = .css-7854fb for less relevant

print(synonyms("reticulum"))

Its just a modified version of Nathan Vērzemnieks's answer.

Saran Roy
  • 1
  • 1