python html.unescape() doesn't work in concole

Asked Aug 06 '16 at 19:18

Active Aug 06 '16 at 19:19

Viewed 152 times

I need to convert html entites like '&#8217' into Unicode strings. I've read html.unescape function can do it, so I gave it a try.

print(html.unescape('&#8217'))

This line, if typed in IDLE (Python Shell), works correctly - quotation appears just as it should. But when a create a .py file with that line of code and try to compile it, the error happens - UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 0: character maps to <undifined>.

So why it fails in concole and works in IDLE? And what should I do? I need html entities to be converted as part of a parser I'm writing.

edited Aug 06 '16 at 19:19

asked Aug 06 '16 at 19:18

parsecer

4,758
13
71
140

2

`html.unescape()` works **fine**. It is **printing** that is the problem, because your console can't handle *that specific character*. – Martijn Pieters Aug 06 '16 at 19:19
@ Martijn Pieters Any way to make console aware of this character? If console can't handle it, I can't be sure the later use of that string (which will be put into a database) will not fail.. – parsecer Aug 06 '16 at 19:21
1

For future reference: try *narrowing down the problem*; `result = html.unescape('’')`, then `print(result)` on separate lines would have pointed you to `print()`, not to `html.unescape()`. – Martijn Pieters Aug 06 '16 at 19:21
I've duplicated you to the canonical question on Python 3 and printing to the Windows console. Not using the Windows console is one way to avoid this issue. – Martijn Pieters Aug 06 '16 at 19:21

python html.unescape() doesn't work in concole

0 Answers0