Python 2.7
I have a program that gets video titles from the source code of a webpage but the titles are encoded in some HTML format.
This is what I've tried so far:
>>> import urllib2
>>> urllib2.unquote('£')
'£'
So that didn't work... Then I tried:
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.unescape('£')
u'\xa3'
as you can see that doesn't work either nor any combination of the two.
I managed to find out that '£'
is an HTML character entity name. The '\xa3' I wasn't able to find out.
Does anyone know how to do this, how to convert HTML content into a readable format in python?