0

I'm trying to decode chatacters which have been encoded in the following way:
&#number;
I tried:

 s.decode("utf8")

and:

 s.decode("unicode-escape")

but both not seems to work.

What is the encoding I should use to decode this kind?

In general - where can I find a list of all valid encodings?

tomermes
  • 22,950
  • 16
  • 43
  • 67
  • See also [Convert XML HTML entities into Unicode string in Python](http://stackoverflow.com/questions/57708/convert-xml-html-entities-into-unicode-string-in-python) – Kos May 11 '13 at 10:10

1 Answers1

5

Python 2:

import HTMLParser
h = HTMLParser.HTMLParser()
print h.unescape('£682m')
£682m

Python 3:

import html.parser
h = html.parser.HTMLParser()
print(h.unescape('£682m'))
£682m

.encode and .decode works in a little bit different way then you expect i'm afraid. See the following:

print 'å'.decode('iso-8859-1')
u'\x86'

The string were encoded in latin-1 when i inputted it into the console (å) but my end-point uses iso-8859-1 so i can re-encode it to fit my endpoint's character encoding.

For more info in character encodings: http://en.wikipedia.org/wiki/Character_encoding

Torxed
  • 22,866
  • 14
  • 82
  • 131
  • Thank you for your answer - but I was asking about translating back something in the format: (some_number); what you provided doesn't work for that – tomermes May 11 '13 at 10:03
  • `number;` -> `symbol`, it's also true to do `symbol` -> `number;` via `h.escape()` obviously. Your question was that you **had** a number and wanted to "decode" it, that's what my solution does.. ask your question correct if you want another answer but as i mentioned, you can do this in reverse and get the oposite. – Torxed May 11 '13 at 11:21
  • sorry! you were right. your code do works. I have other problem while trying to write it into a file but that's a whole other story. – tomermes May 11 '13 at 13:23