0

I have a file log.json that contains one line:

{"k":"caf\u00e9"}

I run the following code on Windows 7 SP1 x64 Ultimate:

import json 
a = json.load(open('log.json', 'r'))
f = open('test.txt', 'w')
f.write(a['k'])

I don't have any issue.

When I run the same code on Max OS X 10.10 x64:

Traceback (most recent call last):
  File "/Users/francky/Documents/workspace/test.py", line 4, in <module>
    f.write(a['k'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

How comes it works fine on Windows but not on OS X?


The versions of the Python interpreter as well as of the JSON Python package are the same on the two OS:

import json 
import sys
print json.__version__
print(sys.version)

returns on OS X:

2.0.9
2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)]

and on Windows:

2.0.9
2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)]

The culprit was PyDev, which used to use the Eclipse workspace or project setting "Text file encoding" and sets this as the Python default encoding (fixed in PyDev 3.4.0 and after).

and to support some non-ASCII characted I had switched Python file to UTF-8:

enter image description here

which caused Python's sys.getdefaultencoding() to be UTF-8:

enter image description here

FYI: Dangers of sys.setdefaultencoding('utf-8')

Community
  • 1
  • 1
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • 1
    what do you get when you type print(sys.getdefaultencoding())? Probably on os x it's the default 'ascii', but not sure what it would be for windows that it is working – lemonhead Jun 17 '15 at 04:38
  • 1
    afaik, os x is doing the "right" thing; it's throwing up because you've asked to write unicode that cannot be encoded with the ascii codec to disk (it uses the system default codec when you do not specify an encoding)... you want to be using something like `f.write(a['k'].encode('utf8'))` – lemonhead Jun 17 '15 at 04:41
  • @lemonhead Thanks, `print(sys.getdefaultencoding())` outputs `ascii` on both computers. Indeed `.encode('utf8')` solves the issue, but any idea why Windows is fine with it? – Franck Dernoncourt Jun 17 '15 at 04:44
  • 1
    hmm, yeah not sure but this section may elucidate things: https://docs.python.org/2/howto/unicode.html#unicode-filenames – lemonhead Jun 17 '15 at 05:24
  • hmm looks like my Eclipse on Windows is modifying `sys.getdefaultencoding()` to `UTF-8` (unlike the Eclipse on Mac). I had run `print(sys.getdefaultencoding())` outside Eclipse, which was a bad idea. Mystery solved, thanks! – Franck Dernoncourt Jun 17 '15 at 05:34

1 Answers1

1

I get your OSX failure on Windows, and it should fail because writing a Unicode string to a file requires an encoding. When you write Unicode strings to a file Python 2 will implicitly convert it to a byte string using the default ascii codec and fails for non-ASCII characters. Are you sure you are running Python 2.7? Python 3 gives no error. io.open is the Python 3 equivalent of open on Python 2 and defaults to using sys.getfilesystemencoding(). Here's how to fix Python 2:

import json
import io
data = r'{"k":"caf\u00e9"}'
a = json.loads(data)
with io.open('test.txt','w') as f:
    f.write(a['k'])

You can optionally specify the exact encoding you want for the output as an additional parameter:

with io.open('test.txt','w',encoding='utf8') as f:
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Thanks, yes sure I'm on Python 2.7. – Franck Dernoncourt Jun 17 '15 at 05:21
  • I'm running Windows 7 SP1 x64 also...and it *should* fail. There's something different about your setup... – Mark Tolonen Jun 17 '15 at 05:26
  • hmm looks like my Eclipse on Windows is modifying `sys.getdefaultencoding()` to `UTF-8` (unlike the Eclipse on Mac). Mystery solved, thanks! – Franck Dernoncourt Jun 17 '15 at 05:33
  • 1
    I'd call that a bug. Changing the default encoding can break modules that assume the default doesn't change. It's not recommended. Python even disables `sys.setdefaultencoding()`. – Mark Tolonen Jun 17 '15 at 05:39
  • And to make things worse: [PyDev used to use the Eclipse workspace or project setting "Text file encoding" and sets this as the Python default encoding.](https://sw-brainwy.rhcloud.com/tracker/PyDev/315)... My PyDev version (3.0.0) is affected by this issue. – Franck Dernoncourt Jun 17 '15 at 05:42