12

I apologize if this question has been asked earlier. I am still not clear about encoding in python3.2.

I am reading a csv(encoded in UTF-8 w/o BOM) and I have French accents in the csv.

Here is the code to opening and reading the csv file:

csvfile = open(in_file, 'r', encoding='utf-8')
fieldnames = ("id","locale","message")    
reader = csv.DictReader(csvfile,fieldnames,escapechar="\\") 
for row in reader:
        if row['id'] == id and row['locale'] == locale:
            out = row['message'];

I am returning the message(out) as Json

jsonout = json.dumps(out, ensure_ascii=True)    
return HttpResponse(jsonout,content_type="application/json; encoding=utf-8")

However when I preview the result I get the accent e(French) being replaced by \u00e9 .

Can you please advice on what I am doing wrong and what should I do so that the json output shows the proper e with accent.

Thanks

tripleee
  • 175,061
  • 34
  • 275
  • 318
tkansara
  • 534
  • 1
  • 4
  • 21
  • *“However when I preview the result I get the accent e(French) being replaced by \u00e9.”* – That sounds right to me. What is your question? – poke Feb 23 '16 at 16:17
  • yeah if you do `print('\u00e9')` in python it prints out é so the representation is correct. If you are seeing `\u00e9` that just means that the program you are seeing it in doesn't understand the accent. – Tadhg McDonald-Jensen Feb 23 '16 at 16:21
  • The httpResponse prints out the response as \u00e9. I waould like to know what change do i need to make so that the response prints out e with accent. – tkansara Feb 23 '16 at 16:22
  • Duplicate of https://stackoverflow.com/questions/18337407/saving-utf-8-texts-with-json-dumps-as-utf8-not-as-u-escape-sequence – tripleee Feb 10 '21 at 11:22

2 Answers2

26

You're doing nothing wrong (and neither is Python).

Python's json module simply takes the safe route and escapes non-ascii characters. This is a valid way of representing such characters in json, and any conforming parser will resurrect the proper Unicode characters when parsing the string:

>>> import json
>>> json.dumps({'Crêpes': 5})
'{"Cr\\u00eapes": 5}'
>>> json.loads('{"Cr\\u00eapes": 5}')
{'Crêpes': 5}

Don't forget that json is just a representation of your data, and both "ê" and "\\u00ea" are valid json representations of the string ê. Conforming json parsers should handle both correctly.

It is possible to disable this behaviour though, see the json.dump documentation:

>>> json.dumps({'Crêpes': 5}, ensure_ascii=False)
'{"Crêpes": 5}'
marcelm
  • 1,032
  • 11
  • 11
  • Cool Thanks for the clarification. – tkansara Feb 23 '16 at 16:23
  • @tkansara: and if you really want to change the default, you can, Python doesn't prevent you from producing full-on UTF-8. See [Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence](https://stackoverflow.com/q/18337407) – Martijn Pieters Feb 23 '16 at 17:21
3

In respect to this answer, setting ensure_ascii=False renders the special characters in your printouts. On the other hand, marcelm's answer is still correct, as no information is lost in those encodings.

Community
  • 1
  • 1
Dave J
  • 475
  • 9
  • 18