58

I have a string like this:

s = u"""{"desc": "\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br \/>\r\nhttp:\/\/www.zhenpin.com\/ <br \/>\r\n<br \/>\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026"}"""

json.loads(s) returns error message like this:

ValueError: Invalid control character at: line 1 column 33 (char 33)

Why does this error occur? How can I solve this problem?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
福气鱼
  • 1,189
  • 2
  • 10
  • 13
  • 3
    possible duplicate of [json.loads(jsonstring) in Python fails if string has a "\r" i.e. carriage return character](http://stackoverflow.com/questions/8324169/json-loadsjsonstring-in-python-fails-if-string-has-a-r-i-e-carriage-return) – Kimvais Feb 15 '12 at 14:52

5 Answers5

125

Another option, perhaps, is to use the strict=False argument

According to http://docs.python.org/2/library/json.html

"If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'."

For example:

json.loads(json_str, strict=False)
calmrat
  • 7,552
  • 3
  • 17
  • 12
69

The problem is your unicode string contains carriage returns (\r) and newlines (\n) within a string literal in the JSON data. If they were meant to be part of the string itself, they should be escaped appropriately. If they weren't meant to be part of the string, they shouldn't be in your JSON either.

If you can't fix where you got this JSON string to produce valid JSON, you could either remove the offending characters:

>>> json.loads(s.replace('\r\n', ''))

or escape them manually:

>>> json.loads(s.replace('\r\n', '\\r\\n'))
Thomas Wouters
  • 130,178
  • 23
  • 148
  • 122
12

The problem is that the character at index 33 is a carriage return control character.

>>> s[33]
u'\r'

According to the JSON spec, valid characters are:

  • Any Unicode character except: ", \, and control-characters (ord(char) < 32).

  • The following character sequences are allowed: \", \\, \/, \b (backspace), \f (form feed), \n (line-feed/new-line), \r (carriage return), \t (tab), or \u followed by four hexadecimal digits.

However, in Python you're going to have to double escape control characters (unless the string is raw) because Python also interprets those control characters.

>>> s = ur"""{"desc": "\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br \/>\r\nhttp:\/\/www.zhenpin.com\/ <br \/>\r\n<br \/>\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026"}"""
>>> json.loads(s)
{u'desc': u'\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br />\r\nhttp://www.zhenpin.com/ <br />\r\n<br />\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026'}

References:

Uyghur Lives Matter
  • 18,820
  • 42
  • 108
  • 144
  • 2
    What if the string is in a variable? For instance, I'm receiving (via an HTTP POST) a JSON object like this: `{"text": "Hello,\n How are you?"}`. I obviously cannot use `r''` to make a raw string from this. How can I ask Python to treat it as such, or is it too late and now I need to use some sort of string replacement? – orokusaki Sep 11 '14 at 21:31
  • 1
    @orokusaki If the JSON you're receiving has literal control characters instead of the proper character sequences, it is indeed too late because the JSON was not properly generated. So you would have to do some string replacement in Python if you can't control the initial generation. – Uyghur Lives Matter Sep 15 '14 at 14:02
  • Thanks for the reply. I ended up just passing `strict=False` to `loads`, which I felt might be a cleaner solution - we'll see if it comes back to bite me :/ – orokusaki Sep 15 '14 at 14:16
8

Try to escape your \n and \r:

s = s.replace('\r', '\\r').replace('\n', '\\n')
json.loads(s)
>>> {u'desc': u'\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br />\r\nhttp://www.zhenpin.com/ <br />\r\n<br />\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026'}
Bogdan
  • 8,017
  • 6
  • 48
  • 64
  • This is part of what I got from another site's API, I don't know if there are other invalid character. Do you know other invalid characters? – 福气鱼 Feb 16 '12 at 01:38
0

In some cases, this error will be raised when the file actually contains a string with a whitespace in it. Deleting the whitespace will solve the problem.

sheldonkreger
  • 858
  • 1
  • 9
  • 25
  • try rewriting your verbiage, which in its current form is more suited to be a comment, and prose it in the form of an answer. Describe what you believe to be the issue and your recommended solution. – Mike McMahon Nov 20 '14 at 00:02
  • 1
    thread revival but FWIW, this answer solved the error for me in my searches. Logged in to give you your vote. Thanks sheldon – Jordan Wayne Crabb Dec 05 '18 at 00:38