I have a string of escaped html markup , 'í'
, and I want it to the correct accented character 'í'
.
Having read around SO, this is my attempt:
messy = 'í'
print type(messy)
>>> <type 'str'>
decoded=messy.decode('utf-8')
print decoded
>>> í
Drats. After reading here, I tried this:
from BeautifulSoup import *
soup = BeautifulSoup(messy, convertEntities=BeautifulSoup.HTML_ENTITIES)
print soup.contents[0].string
>>> í
Still not working, so I tested the example from the SO question I linked to previously.
html = 'Ä'
soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
print soup.contents[0].string
>>> Ä
This one works. Does anyone see what I am missing?