9

Valid json expects escaped newline characters to be encoded as '\\n', with two backslashes. I have data that contains newline characters that I want to save to a file. Here's a simplified version:

data = {'mystring': 'Line 1\nLine 2'}

I can encode it with json.dumps():

import json
json_data = json.dumps(data)
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'

When I print it, the newline displays as '\n', not '\\n' (which I find odd but I can live with):

print(json_data)
# -> {"mystring": "Line 1\nLine 2"}

However (here's the problem) when I output it to a file, the content of the file no longer contains valid json:

f = open('mydata.json', 'w')
f.write(json_data)
f.close()

If I open the file and read it, it contains this:

{"mystring": "Line 1\nLine 2"}

but I was hoping for this:

{"mystring": "Line 1\\nLine 2"}

Oddly (I think), if I read the file using python's open(), the json data is considered valid:

f = open('mydata.json', 'r')
json_data = f.read()
f.close()
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'

... and it decodes OK:

json.loads(json_data)
# -> {u'mystring': u'Line 1\nLine 2'}

My question is why is the data in the file not valid json? If I need another - non Python - application to read it it would probably be incorrect. If I copy and paste the file contents and use json.loads() on it it fails:

import json
json.loads('{"mystring": "Line 1\nLine 2"}')
# -> ValueError: Invalid control character at: line 1 column 21 (char 20)

Can anybody explain if this is the expected behaviour or am I doing something wrong?

plakias
  • 127
  • 1
  • 1
  • 5
  • Just to explain: `the newline displays as '\n', not '\\n' (which I find odd but I can live with)`. That's because `\\ ` is the escape character for printing `\ ` itself. I'm not certain this is your problem but I suspect that in order to actually write two backslashes, you need to give python `\\\\n` – SuperBiasedMan Jul 03 '15 at 09:35

2 Answers2

7

You ran into the pitfall of neglecting the fact that the \ character is also an escape sequence character in Python. Try printing out the last example instead of calling json.loads:

>>> print('{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1
Line 2"}

No way the above is valid JSON. What if the \ character is correctly encoded?

>>> print('{"mystring": "Line 1\\nLine 2"}')
{"mystring": "Line 1\nLine 2"}

Much better, you can then:

>>> json.loads('{"mystring": "Line 1\\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}

Alternatively, if you really appreciate being able to copy some text from some other buffer and paste it into your live interpreter to do decode, you may consider using the raw modifier for your string:

>>> print(r'{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1\nLine 2"}
>>> json.loads(r'{"mystring": "Line 1\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}

See that the \ is no longer automatically escaping with the newline.

Also see: How do I handle newlines in JSON? and note how this is not a problem that exists strictly within Python.

Community
  • 1
  • 1
metatoaster
  • 17,419
  • 5
  • 55
  • 66
  • The r'' raw string syntax will be really helpful. Thanks so much. – plakias Jul 03 '15 at 10:36
  • Out of interest, .encode('string-escape') also works, e.g. json.loads('{"mystring": "Line 1\nLine2"}'.encode('string-escape')) – plakias Jul 03 '15 at 10:56
1

The reason for this:

print(json_data)
# -> {"mystring": "Line 1\nLine 2"}

Is that \\ is a valid escape sequence that ends up as a single backslash \ when trying to print it.

The data in the json file is valid, as the parser is able to parse it :)

The confusion stems from the fact that when you try to print a string with escape sequences those get interpreted. And the sequence \\n is interpreted as \n

Radu Diță
  • 13,476
  • 2
  • 30
  • 34