Might be a silly question but for some reason & isnt recognized as &. I get text from an API and &
is printed as &
and not &. I encode via UTF8 but it doesnt catch it
Asked
Active
Viewed 832 times
-2

rodling
- 988
- 5
- 18
- 44
-
3please, post some code (especially the encoding part) so that we can help you :) – Samuele Mattiuzzo Oct 18 '12 at 15:57
-
You have referenced HTML entity, and it is hard to say where do you "print" it, and what exactly is the result of the API call. – Tadeck Oct 18 '12 at 15:59
-
3See http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string – Mark Ransom Oct 18 '12 at 15:59
-
slipped my mind to check for HTML, thanks for the link @MarkRansom that solved all my problems – rodling Oct 18 '12 at 16:10
2 Answers
4
&
is the HTML escape sequence for the ampersand. It has got nothing to do with the character encoding. If you open the page you're fetching in your browser (if possible), you'll see it in the sourcecode either.

Markus Unterwaditzer
- 7,992
- 32
- 60
1
You can try using BeautifulSoup to translate the HTML Entity names.
from BeautifulSoup import BeautifulStoneSoup
BeautifulStoneSoup("&",convertEntities=BeautifulStoneSoup.ALL_ENTITIES)

Abhijit
- 62,056
- 18
- 131
- 204