3

I am getting a JSON string which has a "\r" character somewhere e.g. "{"data":"foo \r\n bar"}" when I try to parse it throws ValueError.

>>> j="""{"data":"foo \r\n bar"}"""
>>> import json
>>> f=json.loads(j)

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    f=json.loads(j)
  File "C:\Python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 13 (char 13)
>>> j[13]
'\r'

"\r" is a perfectly legal character in a Python string.

How can I parse this JSON string, such that

>>> dct = somehow_parse_json(j)
>>> dct['data']
'foo \r\n bar'

I could easily just find and pop carriage return characters, but I would prefer if they can be saved.

Thanatos
  • 42,585
  • 14
  • 91
  • 146
Optimus
  • 2,716
  • 4
  • 29
  • 49

2 Answers2

5

You should escape slashes in JSON:

j="""{"data":"foo \\r\\n bar"}"""

If you are not escaping them, your JSON is invalid (being valid Python string).

DrTyrsa
  • 31,014
  • 7
  • 86
  • 86
  • 4
    Raw strings would make it easier to read, additionally: `j = r"""{"data":"foo \r\n bar"}"""` – Thanatos Nov 30 '11 at 10:15
  • @DrTyrsa - I am not creating the json, so i would have to "seek and escape" – Optimus Nov 30 '11 at 10:20
  • 1
    @Thanatos - r"""{"data":"foo \\r\\n bar"}""" solved the issue, thanks, you should have posted it as an answer :) – Optimus Nov 30 '11 at 10:22
  • 1
    @Optimus How has it solved it? `r"""{"data":"foo \r\n bar"}""" ` is exactly the same that `"""{"data":"foo \\r\\n bar"}"""`. – DrTyrsa Nov 30 '11 at 10:25
  • 2
    @Optimus: The `r"""..."""` string only had a single slash before the `r`, i.e., `\r\n`. See my comment again. The `r"""..."""` string I posted is simply a different way of writing what DrTyrsa posted, the only difference between the two is for the human reader. – Thanatos Nov 30 '11 at 10:26
  • @DrTyrsa - yes, r"""{"data":"foo \r\n bar"}""" is exactly the same as """{"data":"foo \\r\\n bar"}""", both work, but i don't have to add additional slashes when using this r"""{"data":"foo \r\n bar"}""", thanks, – Optimus Nov 30 '11 at 12:47
1

Logically python is doing what should have been done !

Its the same old CRLF (inspired from typewriters) CR = Carraige Return LF = Line Feed

'\r' stands for CR But '\n' = CR + LF so, my point is that for json its definitely not valid.

For Eg: print '\n 123456\rone' # one3456

Now, how to use \r anyway ?

# if j is your json
j = j.replace('\r','\\r')

That should only escape \r with \\r

Yugal Jindle
  • 44,057
  • 43
  • 129
  • 197
  • json is coming from facebook graph api, If it is not valid json, i will still have to parse it, thanks for help anyways – Optimus Nov 30 '11 at 12:49