Dealing with non-BMP Characters in Python

Asked Jan 25 '16 at 17:44

Active Jan 25 '16 at 17:47

Viewed 512 times

I have non-BMP characters appearing frequently in the text I'm trying to print, which is causing many errors like this in my IDLE window:

UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 18-18: Non-BMP character not supported in Tk

I would like to parse the text before printing it, find these characters, and replace them with \uFFFD (�) so that the python program doesn't crash, and also returns something so I, the user, can see that a non-BMP character was there.

edited Jan 25 '16 at 17:47

Andrea Corbellini

17,339
3
53
69

asked Jan 25 '16 at 17:44

Jake Hillion

1

What Python version are you using? How are you decoding your string? Are you explicitly calling `decode()` or something else? Did you take a look at the [codecs error handlers](https://docs.python.org/3/library/codecs.html#error-handlers)? – Andrea Corbellini Jan 25 '16 at 17:49
@glibdud pretty sure that's it, I didn't see that, thanks for letting me know :) when my code runs across the issue again, if it works, I'll mark it as solved. – Jake Hillion Jan 25 '16 at 19:51

Dealing with non-BMP Characters in Python

0 Answers0