python - possible encoding and decoding values

Question

I'm trying to decode chatacters which have been encoded in the following way:
&#number;
I tried:

 s.decode("utf8")

and:

 s.decode("unicode-escape")

but both not seems to work.

What is the encoding I should use to decode this kind?

In general - where can I find a list of all valid encodings?

See also [Convert XML HTML entities into Unicode string in Python](http://stackoverflow.com/questions/57708/convert-xml-html-entities-into-unicode-string-in-python) — Kos, May 11 '13 at 10:10

score 5 · Accepted Answer · answered May 11 '13 at 09:41

5

Python 2:

import HTMLParser
h = HTMLParser.HTMLParser()
print h.unescape('&pound;682m')
£682m

Python 3:

import html.parser
h = html.parser.HTMLParser()
print(h.unescape('&pound;682m'))
£682m

.encode and .decode works in a little bit different way then you expect i'm afraid. See the following:

print 'å'.decode('iso-8859-1')
u'\x86'

The string were encoded in latin-1 when i inputted it into the console (å) but my end-point uses iso-8859-1 so i can re-encode it to fit my endpoint's character encoding.

For more info in character encodings: http://en.wikipedia.org/wiki/Character_encoding

answered May 11 '13 at 09:41

Torxed

22,866
14
82
131

Thank you for your answer - but I was asking about translating back something in the format: (some_number); what you provided doesn't work for that – tomermes May 11 '13 at 10:03
`number;` -> `symbol`, it's also true to do `symbol` -> `number;` via `h.escape()` obviously. Your question was that you **had** a number and wanted to "decode" it, that's what my solution does.. ask your question correct if you want another answer but as i mentioned, you can do this in reverse and get the oposite. – Torxed May 11 '13 at 11:21
sorry! you were right. your code do works. I have other problem while trying to write it into a file but that's a whole other story. – tomermes May 11 '13 at 13:23

python - possible encoding and decoding values

1 Answers1