I have a python script where I am getting some html and parsing it using beautiful soup. In the HTML sometimes there are no unicode characters and it causes errors with my script and the file I am creating.
Here is how I am getting the HTML
html = urllib2.urlopen(url).read().replace(' ',"")
xml = etree.HTML(html)
When I use this
html = urllib2.urlopen(url).read().encode('ascii', 'xmlcharrefreplace')
I get an error UnicodeDecodeError
How could I change this into unicode. So if there are non unicode characters, my code won't break.