I keep getting the following error when trying to parse some html using BeautifulSoup:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128)
I've tried decoding the html using the solution to the questions below, but keep getting the same error. I've tried all the solutions to the questions below but none of them work (posting so that I don't get duplicate answers and in case they help anyone to find a solution by viewing related approaches to the problem).
Anybody know where I'm going wrong here? Is this a bug in BeautifulSoup and should I install an earlier version?
EDIT: code and traceback below:
from BeautifulSoup import BeautifulSoup as bs
soup = bs(html)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1282, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 946, in __init__
self._feed()
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 971, in _feed
SGMLParser.feed(self, markup)
File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag
self._convert_ref, attrvalue)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128)
EDIT: error message per comment below:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1282, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 946, in __init__
self._feed()
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 971, in _feed
SGMLParser.feed(self, markup)
File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag
self._convert_ref, attrvalue)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128)
Thanks for your help!
'ascii' codec error in beautifulsoup
How do I convert a file's format from Unicode to ASCII using Python?
python UnicodeEncodeError > How can I simply remove troubling unicode characters?