0

I need to convert html entites like '&#8217' into Unicode strings. I've read html.unescape function can do it, so I gave it a try.

print(html.unescape('&#8217'))

This line, if typed in IDLE (Python Shell), works correctly - quotation appears just as it should. But when a create a .py file with that line of code and try to compile it, the error happens - UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 0: character maps to <undifined>.

So why it fails in concole and works in IDLE? And what should I do? I need html entities to be converted as part of a parser I'm writing.

parsecer
  • 4,758
  • 13
  • 71
  • 140
  • 2
    `html.unescape()` works **fine**. It is **printing** that is the problem, because your console can't handle *that specific character*. – Martijn Pieters Aug 06 '16 at 19:19
  • @ Martijn Pieters Any way to make console aware of this character? If console can't handle it, I can't be sure the later use of that string (which will be put into a database) will not fail.. – parsecer Aug 06 '16 at 19:21
  • 1
    For future reference: try *narrowing down the problem*; `result = html.unescape('’')`, then `print(result)` on separate lines would have pointed you to `print()`, not to `html.unescape()`. – Martijn Pieters Aug 06 '16 at 19:21
  • I've duplicated you to the canonical question on Python 3 and printing to the Windows console. Not using the Windows console is one way to avoid this issue. – Martijn Pieters Aug 06 '16 at 19:21

0 Answers0