2

I am using beautifulsoup for scraping data from the html page. Till yesterday every thing was fine. But Now i am getting the error:

'ascii' codec can't encode character u'\xa9' in position 86700: ordinal not in range(128)

i am using the code:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

This is giving me the error.

user12345
  • 2,400
  • 8
  • 33
  • 40

2 Answers2

2

A wild guess:

Try specifying the encoding of the page?

soup = BeautifulSoup(page, fromEncoding=<encoding of the page>)

This can also be a problem with the Python installation. If you print non-ASCII characters without BeautifulSoup, do you face the same problem? If yes, then you need to set the encoding:

import sys
sys.setdefaultencoding("utf-8") # or whatever you want the default encoding to be.
user225312
  • 126,773
  • 69
  • 172
  • 181
0

A wild stab in the dark: you're reading a page that doesn't explicitly declare an encoding and yet is not 7-bit ASCII?

Ulrich Schwarz
  • 7,598
  • 1
  • 36
  • 48