Code:
import requests
import urllib
from bs4 import BeautifulSoup
page1 = urllib.request.urlopen("http://en.wikipedia.org/wiki/List_of_human_stampedes")
soup = BeautifulSoup(page1)
print(soup.get_text())
print(soup.prettify())
Error:
Traceback (most recent call last):
File "C:\Users\sony\Desktop\Trash\Crawler Try\try2.py", line 9, in <module>
print(soup.get_text())
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u014d' in position 10487: character maps to <undefined>
I think the problem lies mainly with urlib package. Here I am using urllib3 package. They changed the urlopen syntax from 2 to 3 version, which maybe the cause of error. But that being said I have included the latest syntax only. Python version 3.4