1

I am currently taking a course on Python and during our unit on Beautiful Soup the instructor uses the following code:

import requests, pprint
from bs4 import BeautifulSoup

url = 'https://www.epicurious.com/search/tofu%20chili'
response = requests.get(url)
page_soup = BeautifulSoup(response.content, 'lxml')
print(page_soup.prettify())

When I run this code, I get the following error:

Traceback (most recent call last):
  File "/Users/arocklin/Documents/Python/whiteboard2.py", line 11, in <module>
    print(page_soup)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1479: ordinal not in range(128)

I was wondering why I got this since it worked for him and how I can fix it going forward. Thanks!

ndim
  • 35,870
  • 12
  • 47
  • 57
anonym00se
  • 13
  • 1
  • 3

1 Answers1

0

Your problem is not related to BeautifulSoup or to parsing HTML. Your code up to and including BeautifulSoup.prettify gets you some unicode string defined by a webserver not under your control.

That more or less arbitrary unicode string you then try to print.

On a system where Python has determined that the terminal sys.stdout can only handle ascii encoded strings, and if the webserver has (for reasons entirely beyond your control) has decided to give you some Unicode characters outside the ASCII range, Python cannot encode that character and throws an exception.

I suggest you research how your version of Python determines the encodings/codecs to use on the platform you are running Python on.

Then put a test case into your program's test suite which actually verifies it can properly output Unicode strings. For that test, you can replace your entire program with

print(u"foo\xe9bar")
ndim
  • 35,870
  • 12
  • 47
  • 57
  • Thank you so much. It seems like this issue is with Atom for some reason. Not sure if you know any solutions to that or not. I am using Atom V 1.20.1 x64 on Mac OS Sierra 10.12. My terminal can print the output but atom has issues with it for some reason. – anonym00se Sep 24 '17 at 21:35