0

I have a python script which writes JSON to a file with content that looks like this:

{
    "album": "Night Hawk",
    "album_artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "bitrate": 744,
    ...
}

The file is uploaded to the server and processed with this:

with open(settings.JSON_UPLOAD_DIRECTORY + f.name, 'wb+') as destination:
    for chunk in f.chunks():
        destination.write(chunk)

This works without error on my MacOS development server. It has also worked for processing several thousands of files on my deployment server until now. Of a sudden I'm getting this error:

22.     with open(settings.JSON_UPLOAD_DIRECTORY + f.name, 'wb+') as destination:

Exception Value: 'ascii' codec can't encode character '\u201c' in position 81: ordinal not in range(128)

I've read other posts about this here without coming to an understanding of what I'm doing wrong. I'm running Python3.6. My question is, do I need to adjust the statement that opens the in memory file for writing, or is there a problem with the encoding of the JSON file itself.

  • What is `f`? The std-lib `json` module doesn't have a method `chunks`, as far as I can see. – lenz Dec 27 '17 at 21:26
  • `json.dump` and friends work generate `str` data, so you need to open output files in text mode ("wt"). Also, `json.dump` has a parameter `ensure_ascii`, which might be of help. – lenz Dec 27 '17 at 21:29
  • @lenz 'f' is the variable I used for the 'in memory' file I'm writing to disk. I should probably change that for clarity. – Daniel Jewett Dec 28 '17 at 00:19

1 Answers1

0

I found inspiration in this other answer. Maybe it'll help? I believe the idea is having a consistent encoding when both writing and reading the file:

As you can see, I took the liberty of adding a bunch of more "problematic" characters in the Json string (the "album" attribute)

json_str = """{
    "album": "Ñíght Håwk 你好",
    "album_artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "artist": "Coleman Hawkins with Eddie \u201cLockjaw\u201d Davis",
    "bitrate": 744
}"""


import json
import tempfile
import os

print(json.loads(json_str))  # Just double checking
path = os.path.join(tempfile.gettempdir(), 'foo.txt')

with open(path, 'w+', encoding='utf-8') as destination:
    # The encoding= is the important part
    destination.write(json_str)

with open(path, 'r+', encoding='utf-8') as source:
    # The encoding= is the important part
    print(json.loads(source.read()))

This seems to output a properly parsed dictionary:

{'album_artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'album': 'Ñíght Håwk 你好', 'bitrate': 744}
path=/tmp/foo.txt
{'album_artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'artist': 'Coleman Hawkins with Eddie “Lockjaw” Davis', 'album': 'Ñíght Håwk 你好', 'bitrate': 744}

However, this output also depends on the terminal's configuration so I'm not 100% sure it will work in your case. I'm using Python 3.5.1 For Python 2.6.x you should use the io library

Savir
  • 17,568
  • 15
  • 82
  • 136
  • 1
    It turns out that the issue is not with the content of the JSON file but with the characters in the file name I'm uploading. I'll have to see what I can do about that next. – Daniel Jewett Dec 28 '17 at 00:17
  • @DanielJewett if the problem is unrelated to this answer, why did you accept it? You should update your question, or maybe even replace it with a new one. – lenz Dec 28 '17 at 08:29