0

I know this has been asked before on Stackoverflow and on other sites but I cannot seem to be able to save a JSON file using escaped Unicode characters (Python3). I have read a lot of tutorials.

What am I missing? I have tried a lot of things but nothing works. I have also tried encoding/decoding in UTF-8 but I am obviously missing something.

Just to be clear, I have managed to get it working for other characters like й (0439) but I am having trouble with a single quote being encoded..

If I have the following dict:

import json
data = {"key": "Test \u0027TEXT\u0027 around"}

I want to save it exactly as it is in a new JSON file, but no matter what I do it always ends up as a single character, which is what is encoded in Unicode.

The following 2 blocks print the exact same thing: {"key": "Test 'TEXT' around"}.

print(json.dumps(data))
print(json.dumps(data, ensure_ascii=False))

Is there any way to keep the Unicode string literal? I want to have that very string as a value: "Test \u0027TEXT\u0027 around"

Serban Cezar
  • 507
  • 1
  • 8
  • 19
  • The thing this JSON is generated by an external source, so when I import my own data, it might crash as it could possibly be unable to work with single quotes as is. – Serban Cezar Mar 22 '20 at 16:49
  • 1
    Any tool that claims to support JSON must be able to deal with both `"Test \u0027TEXT\u0027 around"` and `"Test 'TEXT' around"`. The two variants are equivalent. If you want to enforce `\u0027` notation, you'll have to write and register a `JSONEncoder` subclass with your own implementation for serialising JSON strings. Or use `str.replace()` on the result of `json.dumps`, but that's very ugly. – lenz Mar 22 '20 at 20:07
  • Yes, the replace method is ugly but would be fine for my needs (very specific in this case). I still have not managed to get it working, even with raw strings. – Serban Cezar Mar 22 '20 at 20:53

1 Answers1

0

The behavior you are describing has nothing to do with JSON. This is simply how Python 3 handles strings. Open the shell and write:

>>> "Test \u0027TEXT\u0027 around"
"Test 'TEXT' around"

If you do not want Python to interpret the special characters, you should use raw strings (or maybe even byte sequences):

>>> r"Test \u0027TEXT\u0027 around"
'Test \\u0027TEXT\\u0027 around'

Reference:

Pavel Vergeev
  • 3,060
  • 1
  • 31
  • 39
  • No, of course not. See the reference to get a better understanding of the concept: https://docs.python.org/2.0/ref/strings.html – Pavel Vergeev Mar 22 '20 at 18:08
  • Oh, of course. What I'm trying to show is that OP may be concerned about the wrong thing (json dumps vs escape sequences in python). – Pavel Vergeev Mar 22 '20 at 18:28
  • I see that I was focusing on the wrong thing, I get that I need to work with raw strings but it still seems to be a kludge as it seems to be adding an extra backslash to my string. – Serban Cezar Mar 22 '20 at 20:55
  • 2
    Unfortunately, it does. This is the only way you can both store the value as a string and prevent the escape sequences from being interpreted. – Pavel Vergeev Mar 22 '20 at 22:01