11

As a project to help me learn Python, I'm making a CMD viewer of Reddit using the json data (for example www.reddit.com/all/.json). When certain posts show up and I attempt to print them (that's what I assume is causing the error), I get this error:

Traceback (most recent call last): File "C:\Users\nsaba\Desktop\reddit_viewer.py", line 33, in print ( "%d. (%d) %s\n" % (i+1, obj['data']['score'], obj['data']['title']))

File "C:\Python33\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 32: character maps to

Here is where I handle the data:

request = urllib.request.urlopen(url)
content = request.read().decode('utf-8')
jstuff = json.loads(content)

The line I use to print the data as listed in the error above:

print ( "%d. (%d) %s\n" % (i+1, obj['data']['score'], obj['data']['title']))

Can anyone suggest where I might be going wrong?

abarnert
  • 354,177
  • 51
  • 601
  • 671
N-Saba
  • 137
  • 1
  • 1
  • 6
  • 1
    The problem almost certainly has nothing to do with JSON, or with anything else in your code. Try just `print('\u2019')` and see if you get the same error. If so, the problem is that your terminal ("DOS box") isn't set up to do Unicode output properly, and that's what you need to fix. – abarnert Aug 27 '13 at 19:31
  • Yes you're right. The reason for the extra data is because I've learned to ask questions given the information I have, and not about what I think it might be. – N-Saba Aug 27 '13 at 19:42
  • 1
    But you should post the minimal complete example that demonstrates your problem. That's what an [SSCCE](http://sscce.org) is all about. If `print('\u2019')` is sufficient to demonstrate it, any more complicated example is just going to lead people on wild goose chases. If you're worried people might ask "Why would you want to print that character?", then you can add the context that explains it… but still, lead with the actual problem. – abarnert Aug 27 '13 at 19:45
  • Also, when you have a problem with Python 3, especially when it's about something that's a major change from Python 2 (like Unicode printing), you should use the python-3.x tag. Otherwise, a lot of people will give you a Python 2.x-specific answer (as, in fact, two people did here…). – abarnert Aug 27 '13 at 19:47

5 Answers5

20

It's almost certain that you problem has nothing to do with the code you've shown, and can be reproduced in one line:

print(u'\2019')

If your terminal's character set can't handle U+2019 (or if Python is confused about what character set your terminal uses), there's no way to print it out. It doesn't matter whether it comes from JSON or anywhere else.

The Windows terminal (aka "DOS prompt" or "cmd window") is usually configured for a character set like cp1252 that only knows about 256 of the 110000 characters, and there's nothing Python can do about this without a major change to the language implementation.*

See PrintFails on the Python Wiki for details, workarounds, and links to more information. There are also a few hundred dups of this problem on SO (although many of them will be specific to Python 2.x, without mentioning it).


* Windows has a whole separate set of APIs for printing UTF-16 to the terminal, so Python could detect that stdout is a Windows terminal, and if so encode to UTF-16 and use the special APIs instead of encoding to the terminal's charset and using the standard ones. But this raises a bunch of different problems (e.g., different ways of printing to stdout getting out of sync). There's been discussion about making these changes, but even if everyone were to agree and the patch were written tomorrow, it still wouldn't help you until you upgrade to whatever future version of Python it's added to…

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Sorry for the dupe. I'll look around the resources you've provided. – N-Saba Aug 27 '13 at 19:46
  • 3
    @N-Saba: Well, it's hard to know this is a dup, because it's not really clear what you should be searching for until you already know at least half the answer… – abarnert Aug 27 '13 at 19:48
  • @N-Saba, I know this is an old thread, but you should mark this as the answer if it did answer your question (it did mine) – ivan7707 Feb 27 '15 at 16:32
0

@N-Saba, what is the string that causes the error to be thrown? In my test case, this looks to be a version-specific bug in python 2.7.3.

In the feed I was parsing, the "title" field had the following value:

u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'

I get the expected right single quote char when I call either of these, in python 2.7.6.

python -c "print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title']"
Intel’s Sharp-Eyed Social Scientist

In 2.7.3, I get the error, unless I encode the value that I pulled by KeyName.

print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in range(128)
print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title'].encode('utf-8', 'replace')
Intel’s Sharp-Eyed Social Scientist

fwiw, the @abamert command print('\u2019') prints "9". I think the intended code was print(u'\u2019').

Monte Hayward
  • 459
  • 2
  • 11
0

I came across a similar error when attempting to write an API JSON output to a .cav file via pd.DataFrame.to_csv() on a Win install of Python 2.7.14.

Specifying the encoding as utf-8 fixed my process:

pd.DataFrame.to_csv(filename, encoding='utf-8')
yeliabsalohcin
  • 720
  • 1
  • 6
  • 14
0

For anyone encountering this in macOS, @abarnert's answer is correct and I was able to fix it by putting this at the top of the offending source file:-

# magic to make everything work in Unicode
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

To clarify, this is making sure the terminal output accepts Unicode correctly.

Echelon
  • 7,306
  • 1
  • 36
  • 34
  • Note that this will not work in Python 3 because Python 3 is UTF-8 already and has no ability to set default encoding – APorter1031 Sep 09 '20 at 18:17
-1

I set IDLE (Python Shell) and Window's CMD default font to Lucida Console (a utf-8 supported font) and these types of errors went away; and you no longer see boxes [][][][][][][][]

:)

blakev
  • 4,154
  • 2
  • 32
  • 52