Python encoding and json dumps

Question

I apologize if this question has been asked earlier. I am still not clear about encoding in python3.2.

I am reading a csv(encoded in UTF-8 w/o BOM) and I have French accents in the csv.

Here is the code to opening and reading the csv file:

csvfile = open(in_file, 'r', encoding='utf-8')
fieldnames = ("id","locale","message")    
reader = csv.DictReader(csvfile,fieldnames,escapechar="\\") 
for row in reader:
        if row['id'] == id and row['locale'] == locale:
            out = row['message'];

I am returning the message(out) as Json

jsonout = json.dumps(out, ensure_ascii=True)    
return HttpResponse(jsonout,content_type="application/json; encoding=utf-8")

However when I preview the result I get the accent e(French) being replaced by \u00e9 .

Can you please advice on what I am doing wrong and what should I do so that the json output shows the proper e with accent.

Thanks

*“However when I preview the result I get the accent e(French) being replaced by \u00e9.”* – That sounds right to me. What is your question? — poke, Feb 23 '16 at 16:17
yeah if you do `print('\u00e9')` in python it prints out é so the representation is correct. If you are seeing `\u00e9` that just means that the program you are seeing it in doesn't understand the accent. — Tadhg McDonald-Jensen, Feb 23 '16 at 16:21
The httpResponse prints out the response as \u00e9. I waould like to know what change do i need to make so that the response prints out e with accent. — tkansara, Feb 23 '16 at 16:22
Duplicate of https://stackoverflow.com/questions/18337407/saving-utf-8-texts-with-json-dumps-as-utf8-not-as-u-escape-sequence — tripleee, Feb 10 '21 at 11:22

marcelm · Accepted Answer · 2016-02-23T16:31:11.167

26

You're doing nothing wrong (and neither is Python).

Python's json module simply takes the safe route and escapes non-ascii characters. This is a valid way of representing such characters in json, and any conforming parser will resurrect the proper Unicode characters when parsing the string:

>>> import json
>>> json.dumps({'Crêpes': 5})
'{"Cr\\u00eapes": 5}'
>>> json.loads('{"Cr\\u00eapes": 5}')
{'Crêpes': 5}

Don't forget that json is just a representation of your data, and both "ê" and "\\u00ea" are valid json representations of the string ê. Conforming json parsers should handle both correctly.

It is possible to disable this behaviour though, see the json.dump documentation:

>>> json.dumps({'Crêpes': 5}, ensure_ascii=False)
'{"Crêpes": 5}'

edited Feb 23 '16 at 16:31

answered Feb 23 '16 at 16:22

marcelm

1,032
11
11

Cool Thanks for the clarification. – tkansara Feb 23 '16 at 16:23
@tkansara: and if you really want to change the default, you can, Python doesn't prevent you from producing full-on UTF-8. See [Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence](https://stackoverflow.com/q/18337407) – Martijn Pieters Feb 23 '16 at 17:21

score 3 · Answer 2 · edited May 23 '17 at 10:29

3

In respect to this answer, setting ensure_ascii=False renders the special characters in your printouts. On the other hand, marcelm's answer is still correct, as no information is lost in those encodings.

edited May 23 '17 at 10:29

Community

1
1

answered Feb 23 '16 at 16:31

Dave J

475
9
18

interesting. I tested it and it worked fine for me. might be though, that I used python 3.5. – Dave J Feb 23 '16 at 16:40

Python encoding and json dumps

2 Answers2

Linked

Related