1

I convert a string to a json-object using the json-library:

a = '{"index":1}'
import json
json.loads(a)
{'index': 1}

However, if I instead change the string a to contain a leading 0, then it breaks down:

a = '{"index":01}'
import json
json.loads(a)
>>> JSONDecodeError: Expecting ',' delimiter

I believe this is due to the fact that it is invalid JSON if an integer begins with a leading zero as described in this thread.

Is there a way to remedy this? If not, then I guess the best way is to remove any leading zeroes by a regex from the string first, then convert to json?

N08
  • 1,265
  • 13
  • 23
  • is only the `index` key affected by this? – gold_cy Feb 14 '19 at 11:37
  • @aws_apprentice No, the above is just an example. My JSON-string is very long, so ideally a function should be written to put quotes around any number in the string (I guess?) – N08 Feb 14 '19 at 11:38
  • Can you share a pastebin link with your json string? – ninesalt Feb 14 '19 at 11:41
  • 1
    you have to use your own decoder See the similar post here. https://stackoverflow.com/questions/45068797/how-to-convert-string-int-json-into-real-int-with-json-loads – user_D_A__ Feb 14 '19 at 11:49
  • @AbinayaDevarajan That a number cannot start with `0` unless it is only `0` or starts with `0.` is quite quite heavily baked into the `json` module. The technique described in the question above only works if `json` has recognised that the number is a valid JSON number literal. It's very hard to get around this limitation, you have to forego using a C implementation of the internal `json.scanner` module, and modify a global regex in the Python implementation of the scanner. Which is a pretty ugly hack. – Dunes Feb 14 '19 at 11:59
  • @Dunes, you are right , I tried this hack ..value= json.loads(json.dumps(c)) this seems to give the expected result , I am confused. – user_D_A__ Feb 14 '19 at 12:28

3 Answers3

1

First, using regex on JSON is evil, almost as bad as killing a kitten.

If you want to represent 01 as a valid JSON value, then consider using this structure:

a = '{"index" : "01"}'
import json
json.loads(a)

If you need the string literal 01 to behave like a number, then consider just casting it to an integer in your Python script.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • My JSON-string is very long, and it isn't really feasible to manually do this. Is there a way to automatize this? – N08 Feb 14 '19 at 11:37
  • 1
    I think the problem may lie with the source of your JSON string, rather than your Python code. Any chance you can change the export so that the "numbers" are treated as strings? – Tim Biegeleisen Feb 14 '19 at 11:39
  • Why would using regex be evil? If the string doesn't have any other numbers then I think regex would be the easiest approach to this – ninesalt Feb 14 '19 at 11:46
  • 1
    @ninesalt The JSON content might be nested. Can't give an edge case where regex would fail off the top of my head, but I would rather avoid that chance. – Tim Biegeleisen Feb 14 '19 at 12:04
  • I get the JSON-string from a library, so I can't control it unfortunately. However, it isn't nested, so maybe it is best to write a regex to take care of the integers? – N08 Feb 14 '19 at 12:49
1

How to convert string int JSON into real int with json.loads Please see the post above You need to use your own version of Decoder.

More information can be found here , in the github https://github.com/simplejson/simplejson/blob/master/index.rst

c = '{"value": 02}'
value= json.loads(json.dumps(c))
print(value)

This seems to work .. It is strange

> >>> c = '{"value": 02}'
> >>> import json
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 02}
> >>> c = '{"value": 0002}'
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 0002}

As @Dunes, pointed out the loads produces string as an outcome which is not a valid solution. However,

DEMJSON seems to decode it properly. https://pypi.org/project/demjson/ -- alternative way

>>> c = '{"value": 02}'
>>> import demjson
>>> demjson.decode(c)
{'value': 2}
user_D_A__
  • 460
  • 2
  • 6
  • 14
  • Because `c` is a string and not dictionary containing an int. Strings don't have the same restrictions that numbers do. Even if you dumped a dictionary it still doesn't help as the `json` module will emit valid JSON which `02` is not. – Dunes Feb 14 '19 at 12:35
1

A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0.. The Python json module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation).

You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.

Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.

import json
import re
import threading

# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
    r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
    (re.VERBOSE | re.MULTILINE | re.DOTALL))


# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
    with _LOCK:
        original_number_re = json.scanner.NUMBER_RE
        try:
            json.scanner.NUMBER_RE = number_re
            return json.scanner._original_py_make_scanner(context)
        finally:
            json.scanner.NUMBER_RE = original_number_re

json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner


class MyJsonDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # overwrite the stricter scan_once implementation
        self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)


d = MyJsonDecoder()
n = d.decode('010')
assert n == 10

json.loads('010') # check the normal route still raise an error

I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.

Dunes
  • 37,291
  • 7
  • 81
  • 97