A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0
or starts with 0.
. The Python json
module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010
be parsed as the number 8
(in octal notation) or as 10
(in decimal notation).
You can create a decoder that will do what you want, but you will need to heavily hack the json
module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.
Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.
import json
import re
import threading
# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
with _LOCK:
original_number_re = json.scanner.NUMBER_RE
try:
json.scanner.NUMBER_RE = number_re
return json.scanner._original_py_make_scanner(context)
finally:
json.scanner.NUMBER_RE = original_number_re
json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner
class MyJsonDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# overwrite the stricter scan_once implementation
self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)
d = MyJsonDecoder()
n = d.decode('010')
assert n == 10
json.loads('010') # check the normal route still raise an error
I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.