0
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path = 
r'C:\chromedriver_win32\chromedriver.exe')

driver.get('https://www.imdb.com/')

html_doc = driver.page_source

soup = BeautifulSoup(html_doc, 'lxml')
print(soup.prettify())

driver.quit()

i tried this code and it gives this error.

Traceback (most recent call last): File "E:\Practice\WebScraping\webscrape.py", line 11, in print(soup.prettify()) File "C:\Users\vmbck\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u25ec' in position 241524: character maps to

then i tried with encode("utf-8")

html_doc = driver.page_source.encode("utf-8")

again it gives that error

how can i get page_source without getting UnicodeEncodeError

Buddhika Chathuranga
  • 1,334
  • 2
  • 13
  • 22

2 Answers2

1
import requests
from bs4 import BeautifulSoup
a = requests.get('https://www.imdb.com/')
soup = BeautifulSoup(a.content, 'lxml')
print(soup.prettify())

The above code does similar to what you have written. But, to solve the unicode error, you can try doing what was suggested in the following post Python Unicode Encode Error

rawwar
  • 4,834
  • 9
  • 32
  • 57
-1

if encoding to utf-8 is failing try to encode to ascii

try both : -

print(soup.encode('utf-8').prettify())

and

print(soup.encode('ascii').prettify())
bhavesh27
  • 94
  • 7
  • found this too...The problem is that the console you're running on is not capable of handling the character you're trying to print: … See stackoverflow.com/questions/3597480/… for some hints. – bhavesh27 May 14 '18 at 13:26
  • Ascii is a (very small) subset of UTF-8 so the best that the using ASCII encoding can do is nothing. However it will guarantee corruption of Unicode characters with a code point of above 127. – Dragonthoughts May 15 '18 at 21:46