16

I have a html text like this:

<xml ... >

and I want to convert it to something readable:

<xml ...>

Any easy (and fast) way to do it in Python?

Alexandru
  • 25,070
  • 18
  • 69
  • 78
  • 1
    I think the question is a duplicate of this: http://stackoverflow.com/questions/57708/convert-xml-html-entities-into-unicode-string-in-python – Fred Larson Apr 08 '09 at 14:37
  • 1
    Possible duplicate of [Decode HTML entities in Python string?](http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string) – csl Sep 22 '16 at 07:07
  • Best approach here https://stackoverflow.com/questions/2360598/how-do-i-unescape-html-entities-in-a-string-in-python-3-1 – fmalina Nov 06 '19 at 16:41

3 Answers3

26

Python >= 3.4

Official documentation for HTMLParser: Python 3

>>> from html import unescape
>>> unescape('&copy; &euro;')
© €

Python < 3.5

Official documentation for HTMLParser: Python 3

>>> from html.parser import HTMLParser
>>> pars = HTMLParser()
>>> pars.unescape('&copy; &euro;')
© €

Note: this was deprecated in the favor of html.unescape().

Python 2.7

Official documentation for HTMLParser: Python 2.7

>>> import HTMLParser
>>> pars = HTMLParser.HTMLParser()
>>> pars.unescape('&copy; &euro;')
u'\xa9 \u20ac'
>>> print _
© €
frainfreeze
  • 567
  • 6
  • 20
vartec
  • 131,205
  • 36
  • 218
  • 244
  • unescape is just an internal function of HTMLParser (and it's not documented in your link). however I could use the implementation. 10x a lot – Alexandru Apr 08 '09 at 15:54
  • @brtzsnr: true, that it's undocumented. Don't think that it's internal though, after all name is unescape not _unescape or __unescape. – vartec Apr 08 '09 at 16:01
3

Modern Python 3 approach:

>>> import html
>>> html.unescape('&copy; &euro;')
© €

https://docs.python.org/3/library/html.html

fmalina
  • 6,120
  • 4
  • 37
  • 47
1

There is a function here that does it, as linked from the post Fred pointed out. Copied here to make things easier.

Credit to Fred Larson for linking to the other question on SO. Credit to dF for posting the link.

Benson
  • 22,457
  • 2
  • 40
  • 49