0

Example code:

import requests
import bs4 as bs
import urllib
import urllib.request
from urllib.request import Request

req = Request("https://www.openbugbounty.org/researchers/Spam404/vip/page/1/", 
              headers ={'User-Agent':'Mozilla/5.0'})
sauce = urllib.request.urlopen(req).read()
soup = bs.BeautifulSoup(sauce,'lxml')
print(soup)

My output gives me the following error:

File "/Users/student/Desktop/AutoBots/kbfza2.py", line 15, in print(soup) UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 5194: ordinal not in range(128)

After searching for a solution online for a while, it seemed that changing my soup line to:

soup = bs.BeautifulSoup(sauce.decode('utf-8','ignore'),'lxml)  

would solve this for me but this hasn't fixed anything for me.
Am I mistaken in thinking that the decode function with the ignore argument should allow me to print(soup) without error even if it isn't successfully decoded completely?

1 Answers1

0

Just re-read your question and I believe you are trying to print Unicode text to a console that doesn't support that character set (I couldn't reproduce the error with the code you posted).

You may need to force your console output to utf-8 encoding, or if you are using an IDE like sublime text, change it to render utf-8 encoding.

drec4s
  • 7,946
  • 8
  • 33
  • 54