Foreword
The only exception explicitly raised by the decoding code is json.JSONDecodeError
, so the exception type does not help diagnose problems. What's interesting is the associated message. However, it is possible that decoding bytes to text fails, before JSON decoding can be attempted. That is a separate issue beyond the scope of this post.
It's worth noting here that the JSON format documentation uses different terminology from Python. In particular, a portion of valid JSON data enclosed in {}
is an object (not "dict") in JSON parlance, and a portion enclosed in []
is an array (not "list"). I will use JSON terminology when talking about the file contents, and Python terminology when talking about the parsed result or about data created directly by Python code.
As a general hint: use a dedicated JSON viewer to examine the file, or at least use a text editor that has some functionality to "balance" brackets (i.e., given that the insertion pointer is currently at a {
, it will automatically find the matching }
).
Not JSON
An error message saying Expecting value
is a strong indication that the data is not intended to be JSON formatted at all. Carefully note the line and column position of the error for more information:
if the error occurs at line 1, column 1, it will be necessary to inspect the beginning of the file. It could be that the data is actually empty. If it starts with <
, then that of course suggests XML rather than JSON.
Otherwise, there could be some padding preceding actual JSON content. Sometimes, this is to implement a security restriction in a web environment; in other cases it's to work around a different restriction. The latter case is called JSONP (JSON with Padding). Either way, it will be necessary to inspect the data to figure out how much should be trimmed from the beginning (and possibly also the end) before parsing.
other positions might be because the data is actually the repr
of some native Python data structure. Data like this can often be parsed using ast.literal_eval
, but it should not be considered a practical serialization format - it doesn't interoperate well with code not written in Python, and using repr
can easily produce data that can't be recovered this way (or in any practical way).
Note some common differences between Python's native object representations and the JSON format, to help diagnose the problem:
JSON uses only double quotes to surround strings; Python may also use single quotes, as well as triple-single ('''example'''
) or triple-double ("""example"""
) quotes.
JSON uses lowercase true
and false
rather than True
and False
to represent booleans. It uses null
rather than None
as a special "there is nothing here" value. It uses Infinity
and NaN
to represent special floating-point values, rather than inf
and nan
.
One subtlety: Expecting value
can also indicate a trailing comma in an array or object. JSON syntax does not allow a trailing comma after listing elements or key-value pairs, although Python does. Although the comma is "extra", this will be reported as something missing (the next element or key-value pair) rather than something extraneous (the comma).
An error message saying Extra data
indicates that there is more text after the end of the JSON data.
If the error occurs at line 2 column 1, this strongly suggests that the data is in fact in JSONL ("JSON Lines") format - a related format wherein each line of the input is a separate JSON entity (typically an object). Handling this is trivial: just iterate over lines of the input and parse each separately, and put the results in a list. For example, use a list comprehension: [json.loads(line) for line in open_json_file]
. See Loading JSONL file as JSON objects for more.
Otherwise, the extra data could be part of JSONP padding. It can be removed before parsing; or else use the .raw_decode
method of the JSONDecoder
class:
>>> import json
>>> example = '{"key": "value"} extra'
>>> json.loads(example) # breaks because of the extra data:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 18 (char 17)
>>> parsed, size = json.JSONDecoder().raw_decode(example)
>>> parsed
{'key': 'value'}
>>> size # amount of text that was parsed.
16
Another possibility - especially likely if the error is on line 1, at a low number for the column position (e.g. line 1, column 10), is that the data is CSV format. For example, this was the case in https://stackoverflow.com/questions/75339096.
This can happen because the value in the "top-left cell of the spreadsheet" represented by the CSV file contains a comma. When that happens, the CSV format needs to surround that string in quotes (so that comma isn't confused for a separator); that makes it look like valid JSON (for a JSON that only contains one string) followed by "extra data" (the comma separating that from the next "cell", along with the rest of the CSV data).
For example, a valid CSV file could look like
"x,y",z
"(1, 2)",3
The "x,y"
is valid JSON by itself (representing exactly what one might expect), but parsing the whole thing causes an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 6 (char 5)
Invalid string literals
Error messages saying any of:
Invalid \\uXXXX escape
Invalid \\escape
Unterminated string starting at
Invalid control character
suggest that a string in the data isn't properly formatted, most likely due to a badly written escape code.
JSON strings can't contain control codes in strict mode (the default for parsing), so e.g. a newline must be encoded with \n
. Note that the data must actually contain a backslash; when viewing a representation of the JSON data as a string, that backslash would then be doubled up (but not when, say, print
ing the string).
JSON doesn't accept Python's \x
or \U
escapes, only \u
. To represent characters outside the BMP, use a surrogate pair:
>>> json.loads('"\\ud808\\udf45"') # encodes Unicode code point 0x12345 as a surrogate pair
''
Unlike in Python string literals, a single backslash followed by something that doesn't make a valid escape sequence (such as a space) will not be accepted:
>>> json.loads('"\\ "') # the input string has only one backslash
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 2 (char 1)
Similarly, single-quotes must not be escaped within JSON strings, although double-quotes must be.
When debugging or testing an issue like this at the REPL, it's important not to get confused between JSON's escaping and Python's.
Wrong brackets
Expecting ',' delimiter
and Expecting ':' delimiter
imply a mismatch between the brackets used for an object or array and the contents. For example, JSON like ["foo": "bar"]
was almost certainly intended to represent an object, so it should have enclosing {}
rather than []
. Look at the line and character position where the error was reported, and then scan backwards to the enclosing bracket.
However, these errors can also mean exactly what they say: there might simply be a comma missing between array elements or key-value pairs, or a colon missing between a key and its value.
Invalid key
While Python allows anything hashable as a dict key, JSON requires strings for its object keys. This problem is indicated by Expecting property name enclosed in double quotes
. While it could occur in hand-written JSON, it likely suggests the problem of data that was inappropriate created by using repr
on a Python object. (This is especially likely if, upon checking the indicated location in the file, it appears that there is an attempt at a string key in single quotes.)
The error message Expecting property name enclosed in double quotes
could also indicate a "wrong brackets" problem. In particular, if the data should be an array that contains integers, but was enclosed in {}
instead of []
, the parser would be expecting a double-quoted string key before anything else, and complain about the first integer in the list.