Unicode issue trying to process JSON file

Question

I have a python script which writes JSON to a file with content that looks like this:

{
    "album": "Night Hawk",
    "album_artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "bitrate": 744,
    ...
}

The file is uploaded to the server and processed with this:

with open(settings.JSON_UPLOAD_DIRECTORY + f.name, 'wb+') as destination:
    for chunk in f.chunks():
        destination.write(chunk)

This works without error on my MacOS development server. It has also worked for processing several thousands of files on my deployment server until now. Of a sudden I'm getting this error:

22.     with open(settings.JSON_UPLOAD_DIRECTORY + f.name, 'wb+') as destination:

Exception Value: 'ascii' codec can't encode character '\u201c' in position 81: ordinal not in range(128)

I've read other posts about this here without coming to an understanding of what I'm doing wrong. I'm running Python3.6. My question is, do I need to adjust the statement that opens the in memory file for writing, or is there a problem with the encoding of the JSON file itself.

What is `f`? The std-lib `json` module doesn't have a method `chunks`, as far as I can see. — lenz, Dec 27 '17 at 21:26
`json.dump` and friends work generate `str` data, so you need to open output files in text mode ("wt"). Also, `json.dump` has a parameter `ensure_ascii`, which might be of help. — lenz, Dec 27 '17 at 21:29
@lenz 'f' is the variable I used for the 'in memory' file I'm writing to disk. I should probably change that for clarity. — Daniel Jewett, Dec 28 '17 at 00:19

score 0 · Answer 1 · answered Dec 27 '17 at 21:26

I found inspiration in this other answer. Maybe it'll help? I believe the idea is having a consistent encoding when both writing and reading the file:

As you can see, I took the liberty of adding a bunch of more "problematic" characters in the Json string (the "album" attribute)

json_str = """{
    "album": "Ñíght Håwk 你好",
    "album_artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "bitrate": 744
}"""


import json
import tempfile
import os

print(json.loads(json_str))  # Just double checking
path = os.path.join(tempfile.gettempdir(), 'foo.txt')

with open(path, 'w+', encoding='utf-8') as destination:
    # The encoding= is the important part
    destination.write(json_str)

with open(path, 'r+', encoding='utf-8') as source:
    # The encoding= is the important part
    print(json.loads(source.read()))

This seems to output a properly parsed dictionary:

{'album_artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'album': 'Ñíght Håwk 你好', 'bitrate': 744}
path=/tmp/foo.txt
{'album_artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'album': 'Ñíght Håwk 你好', 'bitrate': 744}

However, this output also depends on the terminal's configuration so I'm not 100% sure it will work in your case. I'm using Python 3.5.1 For Python 2.6.x you should use the io library

It turns out that the issue is not with the content of the JSON file but with the characters in the file name I'm uploading. I'll have to see what I can do about that next. — Daniel Jewett, Dec 28 '17 at 00:17
@DanielJewett if the problem is unrelated to this answer, why did you accept it? You should update your question, or maybe even replace it with a new one. — lenz, Dec 28 '17 at 08:29

Unicode issue trying to process JSON file

1 Answers1