error: UnicodeEncodeError: 'gbk' codec can't encode charactor

Question

I'm a python beginner. I wrote code as following:

from bs4 import BeautifulSoup
import requests

url = "http://www.google.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
links = soup.find_all("a")
for link in links:
    print(link.text)

When run this .py file in windows powershell, the print(link.text) causes the following error.

error: UnicodeEncodeError: 'gbk' codec can't encode charactor '\xbb' in position 5: 
illegal multibyte sequence.

I know the error is caused by some chinese characters, and It seem like I should use 'decode' or 'ignore', but I don't know how to fix my code. Help please! Thanks!

score 0 · Answer 1 · answered Feb 28 '17 at 03:06

0

If you don't wish to display those special chars:
You can ignore them by:

print(link.text.encode(errors="ignore"))

answered Feb 28 '17 at 03:06

Anurag

59
1
1
6

score 0 · Answer 2 · edited May 23 '17 at 11:53

0

You can encode the string in utf8.

for link in links:
    print(link.text.encode('utf8'))

But better approach is:

response = requests.get(url)
soup = BeautifulSoup(response.text.encode("utf8"), "html.parser")

To understand more about the problem you are facing, you should look at this stackoverflow answer.

edited May 23 '17 at 11:53

Community

1
1

answered Feb 28 '17 at 03:10

Wasi Ahmad

35,739
32
114
161

error: UnicodeEncodeError: 'gbk' codec can't encode charactor

2 Answers2