3

There is a json data which contains some Chinese characters.

{
  "font_size": "47",
  "sentences": [
    "你好",
    "sample sentence1",
    "sample sentence2",
    "sample sentence3",
    "sample sentence4",
    "sample sentence5",
    "sample sentence6",
    "sample sentence7",
    "sample sentence8",
    "sample sentence9"
  ]
}

I create a Flask app and use it to receive above json data. I use below curl command to post data.

curl -X POST \
  http://0.0.0.0:5000/ \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json;charset=UTF-8' \
  -H 'Postman-Token: af380f7a-42a8-cfbb-9177-74bb348ce5ed' \
  -d '{
  "font_size": "47",
  "sentences": [
    "你好",
    "sample sentence1",
    "sample sentence2",
    "sample sentence3",
    "sample sentence4",
    "sample sentence5",
    "sample sentence6",
    "sample sentence7",
    "sample sentence8",
    "sample sentence9"
  ]
}'

After I receive json data from request.data, I convert it to json, in fact request.data is str.

json_data = json.loads(request.data)

Then I want to format a string with json_data.

subtitles.format(**json_data)

I got an error.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

How to solve it? Thanks in advance.

EDIT

subtitles is read from file.

subtitles_file = "{path}/templates/sorry/subtitles.ass".format(path=root_path)
with open(subtitles_file, 'r') as file:
     subtitles = file.read()

EDIT Python 2 or Python 3

I'm using python 2 and this error occurs. However Python 3 can automatically handle this. So enjoy Python 3.

CoXier
  • 2,523
  • 8
  • 33
  • 60
  • Possibly?: https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20 – sshashank124 Mar 14 '18 at 06:21
  • Yes I know this link however I don't know how to fix my program. Maybe I am stupid. – CoXier Mar 14 '18 at 06:25
  • Where is `subtitles` defined. Try encoding it as a unicode string and then formatting – sshashank124 Mar 14 '18 at 06:26
  • Is this Python 2 or 3? For many problems it doesn't make a difference, but for Unicode encoding problems, it's usually critically important. – abarnert Mar 14 '18 at 06:34
  • Also, if this _is_ Python 2, is there a reason you're using Python 2? Because if you just use the current version, these problems will go away automatically. (Also, Python 3 is the primary supported version for Flask, and most other packages nowadays.) – abarnert Mar 14 '18 at 06:43
  • I've added the python-2.7 tag based on your comment on my answer, but in the future, please add it yourself when you think it will be relevant, and especially when someone directly asks you in the comments. – abarnert Mar 14 '18 at 06:56
  • @CoXier: Did you try putting `# -*- coding: utf-8 -*-` at the top? – Vimanyu Mar 14 '18 at 06:56
  • @Vimanyu That might be the issue if either `subtitles` or `request.data` were literals in the source code, but one comes from a file and the other from a Flask request, and as far as I can tell there are no non-ASCII characters in his actual code. – abarnert Mar 14 '18 at 06:57
  • @abarnert I will add relevant tag net time. – CoXier Mar 14 '18 at 07:02

1 Answers1

2

In Python 2, when you open and read a file, what you get is a regular str, not a unicode.

Meanwhile, even if request.data is a str rather than a unicode, if any of the strings in it are non-ASCII, json_data will contain unicode.

So, when you do subtitles.format, that's going to try to encode each unicode using your default encoding—which, if you haven't done anything, is ASCII. Which will give exactly this error.

The simplest fix is to change subtitles to a unicode. Like this:

with open(subtitles_file, 'r') as file:
    subtitles = file.read().decode('utf-8')

… or:

with codecs.open(subtitles_file, 'r', 'utf-8') as file:
    subtitles = file.read()

(I'm guessing that you want UTF-8; if your files are in some other encoding, obviously use that instead.)

abarnert
  • 354,177
  • 51
  • 601
  • 671