-2

Might be a silly question but for some reason & isnt recognized as &. I get text from an API and & is printed as & and not &. I encode via UTF8 but it doesnt catch it

rodling
  • 988
  • 5
  • 18
  • 44

2 Answers2

4

& is the HTML escape sequence for the ampersand. It has got nothing to do with the character encoding. If you open the page you're fetching in your browser (if possible), you'll see it in the sourcecode either.

Markus Unterwaditzer
  • 7,992
  • 32
  • 60
1

You can try using BeautifulSoup to translate the HTML Entity names.

from BeautifulSoup import BeautifulStoneSoup
BeautifulStoneSoup("&",convertEntities=BeautifulStoneSoup.ALL_ENTITIES)
Abhijit
  • 62,056
  • 18
  • 131
  • 204