0

I'm on macOS and i'm using Python 3.6 with Sublime Text 3. When i run my script i get the error in the title. I already tried everything i could find on the net to resolve this but i still have the same issue and no idea to to solve it. Here is my script:

import requests
import subprocess
import time

from bs4 import BeautifulSoup

response = requests.get("https://news.ycombinator.com")
soup = BeautifulSoup(response.content, "html.parser")
for story in soup.find_all(class_="storylink"):
    title = story.get_text()
    print(title + "\n")

Full stacktrace is:

The Land of Lisp

Traceback (most recent call last):
  File "/Users/dave/Programming/Python/ReadHackerNews/read_hackernews.py", line 12, in <module>
Lost Laughs in Leisure Suit Larry

    print(title + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 23: ordinal not in range(128)
[Finished in 1.2s]

the issue is with the title variable and yes, i understand that it contains some unicode characters that python doesn't know how to print (because it uses ASCII???).

What i got working was, to print the unicode character in the form of 13\xc2\xa0comments for example. But i want to print it as the unicode character...

If you run the script you have to have some "luck" to run into that issue since not every title on hackernews contains a unicode character. Also, the say command is only present on macOS - remove it if you test on another OS.

EDIT: For fun i tried to execute the script in the terminal and there i don't get the error! So this has something to do with sublime text 3...

EDIT2: It works if i add sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Endogen
  • 589
  • 2
  • 12
  • 24
  • Could you be more specific please? I guess you mean to use it on `title`? Do i have to encode it or decode it? With which arguments? As you can see, i don't know what is encoded how so it's not easy for me to understand what i have to encode how – Endogen Oct 06 '17 at 19:06
  • there are a few good answers here:https://stackoverflow.com/questions/7219361/python-and-beautifulsoup-encoding-issues?rq=1 – brddawg Oct 06 '17 at 19:09
  • I added this line `soup.prettify()` before the loop but got the same error. Also tried it on `title` and also with `prettify('latin-1')` always the same error – Endogen Oct 06 '17 at 19:19
  • Also tried `soup = BeautifulSoup(response.content.decode('utf-8','ignore'))` which also runs into the same error – Endogen Oct 06 '17 at 19:22
  • Please post the full traceback and [remove unrelated code](https://stackoverflow.com/help/mcve) from the example. – emulbreh Oct 06 '17 at 19:27
  • 2
    You probably get the exception when you call `print` because stdout's encoding is set to ascii. – emulbreh Oct 06 '17 at 19:28
  • @emulbreh seems to work if i add `sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())` but why does it also work in the terminal but not in sublime text? Please read my edit – Endogen Oct 06 '17 at 19:47
  • 2
    `sys.stdout.encoding` is chosen [based on the environment](https://docs.python.org/3/library/sys.html#sys.stdout). When Sublime runs the Python subprocess it doesn't include a locale variable which you have set in your terminal. – emulbreh Oct 06 '17 at 20:07

0 Answers0