Python3 UnicodeEncodingError

Question

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path = 
r'C:\chromedriver_win32\chromedriver.exe')

driver.get('https://www.imdb.com/')

html_doc = driver.page_source

soup = BeautifulSoup(html_doc, 'lxml')
print(soup.prettify())

driver.quit()

i tried this code and it gives this error.

Traceback (most recent call last): File "E:\Practice\WebScraping\webscrape.py", line 11, in print(soup.prettify()) File "C:\Users\vmbck\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u25ec' in position 241524: character maps to

then i tried with encode("utf-8")

html_doc = driver.page_source.encode("utf-8")

again it gives that error

how can i get page_source without getting UnicodeEncodeError

thank you very much.... i fixed that with html_doc = ascii(driver.page_source) — Buddhika Chathuranga, May 14 '18 at 14:47

score 1 · Accepted Answer · answered May 14 '18 at 13:23

import requests
from bs4 import BeautifulSoup
a = requests.get('https://www.imdb.com/')
soup = BeautifulSoup(a.content, 'lxml')
print(soup.prettify())

The above code does similar to what you have written. But, to solve the unicode error, you can try doing what was suggested in the following post Python Unicode Encode Error

score -1 · Answer 2 · answered May 14 '18 at 13:22

-1

if encoding to utf-8 is failing try to encode to ascii

try both : -

print(soup.encode('utf-8').prettify())

and

print(soup.encode('ascii').prettify())

answered May 14 '18 at 13:22

bhavesh27

94
7

found this too...The problem is that the console you're running on is not capable of handling the character you're trying to print: … See stackoverflow.com/questions/3597480/… for some hints. – bhavesh27 May 14 '18 at 13:26
Ascii is a (very small) subset of UTF-8 so the best that the using ASCII encoding can do is nothing. However it will guarantee corruption of Unicode characters with a code point of above 127. – Dragonthoughts May 15 '18 at 21:46

Python3 UnicodeEncodingError

2 Answers2