My goal is to read an html document using beautiful soup, add ids
to some tags and write the html back to file.
The html document has html entities such as ☒
representing ☒
. When I create a beautiful soup object, the html entity gets converted to ☒
. When I write the soup back to html using str(soup)
, the html file contains ☒
instead of ☒
. Opening this in a browser yields ☒
when I want ☒
.
I tried using str(soup.encode(formatter='html'))
, where it did convert to UTF-8 encoding, but the html in the browser shows \xe2\x98\x92
.
I'm guessing there is something simple that I'm missing. Any thoughts, on how to keep the special characters in the original document intact after processing it in beautiful soup?