20

According to this answer, newlines in a JSON string should always be escaped. This does not appear to be necessary when I load the JSON with json.load().

I've saved the following string to file:

{'text': 'Hello,\n How are you?'}

Loading the JSON with json.load() does not throw an exception, even though the \n is not escaped:

>>> with open('test.json', 'r') as f:
...   json.load(f)
...
{'text': 'Hello,\n How are you?'}

However, if I use json.loads(), I get an exception:

>>> s
'{"text": "Hello,\n How are you?"}'
>>> json.loads(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python34\lib\json\__init__.py", line 318, in loads
    return _default_decoder.decode(s)
  File "c:\Python34\lib\json\decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\Python34\lib\json\decoder.py", line 359, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 17 (char 16)

My questions:

  1. Does json.load() automatically escape \n inside the file object?
  2. Should one always do \\n regardless of whether the JSON will be read by json.load() or json.loads()?
Dirty Penguin
  • 4,212
  • 9
  • 45
  • 69
  • You are not loading the same string... The one you wrote is simply not valid JSON. The file probably have the `\n` escaped – JBernardo Aug 08 '17 at 14:18
  • @JBernardo As far as I know, the file is the same: I saved the file. – Dirty Penguin Aug 08 '17 at 14:19
  • 2
    Then there is your problem... You actually wrote `\n` on a file on a text editor which is the same as escaping it. BTW you should not write JSON by hand. There is a reason you got a `dumps` function – JBernardo Aug 08 '17 at 14:21
  • So when Python reads a file, does it automatically escape any `\n` it finds? Or does the file editor do something special when it saves the file to disk? – Dirty Penguin Aug 08 '17 at 14:23
  • Btw, this is a bit of a contrived example. My actual use case is reading from an API which provides a JSON file containing unescaped `\n` chars. – Dirty Penguin Aug 08 '17 at 14:25
  • @DirtyPenguin you are mixing up `repr` and actual value of string. There is a semantic difference between _string with newline character_ and _string with backslash followed by letter n_. – Łukasz Rogalski Aug 08 '17 at 14:26

3 Answers3

25

json.load() reads from a file descriptor and json.loads() reads from a string.

Within your file, the \n is properly encoded as a newline character and does not appear in the string as two characters, but as the correct blank character you know.

But within a string, if you don't double escape the \\n then the loader thinks it is a control character. But newline is not a control sequence for JSON (newline is in fact a character like any other).

By doubling the backslash you actually get a real string with \n in it, and only then will Python transform the \n into a newline char.

Fabien
  • 4,862
  • 2
  • 19
  • 33
  • Could you elaborate on the difference between "control character" and "control sequence"? – Dirty Penguin Aug 08 '17 at 14:27
  • There is not much of a difference, it's synonymous here. What I mean is that the escape characters for JSON are restricted for purposes such as encoding binary data, not newlines. While in Python the control sequence can be used to encode special characters, such as `\t` or `\n`... – Fabien Aug 08 '17 at 14:29
10

EDITED: Already answered here: https://stackoverflow.com/a/16544933/1054458

Maybe the strict option can help:

test.py:

import json

s = '''{
"asdf":"foo
bar"
}'''

print(json.loads(s, strict=False)["asdf"])

output:

$> python test.py
foo
bar
FarK
  • 566
  • 1
  • 4
  • 16
-1

The mistake in here is: When you use notepad to open a text file, and it says:

{'text': 'Hello,\n How are you?'}

The "\" and "n" are separate characters, like any other characters in this file.

When in python program, you write:

s='{"text": "Hello,\n How are you?"}'

do a test:

>>> s[15]
','
>>> s[16]
'\n'
>>> s[17]
' '

Don't miss the most interesting part: The \n in here is ONE character, in s[16], which means ASCII=10, a control character.

This control character means Carriage Return, or a new line. Anyway, with the existing of this control character, it is failed to be loaded as a JSON object.

You actually have to write

s='{"text": "Hello,\\n How are you?"}'

to make it exactly the same as in the text file.

Ben Lin
  • 807
  • 10
  • 15