13

I'm trying to convert a string which represents a JSON object to a real JSON object using json.loads but it doesn't convert the integers:

(in the initial string, integers are always strings)

$> python
Python 2.7.9 (default, Aug 29 2016, 16:00:38)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> c = '{"value": "42"}'
>>> json_object = json.loads(c, parse_int=int)
>>> json_object
{u'value': u'42'}
>>> json_object['value']
u'42'
>>>

Instead of {u'value': u'42'} I'd like it becomes {u'value': 42}. I know I can run through the whole object, but I don't want to do that, it's not really efficient to do it manually, since this parse_int argument exists (https://docs.python.org/2/library/json.html#json.loads).

Thanks to Pierce's proposition:

Python 2.7.9 (default, Aug 29 2016, 16:00:38)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>>
>>> class Decoder(json.JSONDecoder):
...     def decode(self, s):
...         result = super(Decoder, self).decode(s)
...         return self._decode(result)
...     def _decode(self, o):
...         if isinstance(o, str) or isinstance(o, unicode):
...             try:
...                 return int(o)
...             except ValueError:
...                 try:
...                     return float(o)
...                 except ValueError:
...                     return o
...         elif isinstance(o, dict):
...             return {k: self._decode(v) for k, v in o.items()}
...         elif isinstance(o, list):
...             return [self._decode(v) for v in o]
...         else:
...             return o
...
>>>
>>> c = '{"value": "42", "test": "lolol", "abc": "43.4",  "dcf": 12, "xdf": 12.4}'
>>> json.loads(c, cls=Decoder)
{u'test': u'lolol', u'dcf': 12, u'abc': 43.4, u'value': 42, u'xdf': 12.4}
Léo
  • 792
  • 1
  • 7
  • 27
  • 4
    Why is it `"42"` instead of `42` in the first place? – Stefan Pochmann Jul 12 '17 at 23:01
  • 2
    Well your JSON example `'{"value": "42"}'` has 42 as a string — not an int. Your best bet is either to fix the data coming in or (if that's not feasible) write a [custom JSON decoder](https://docs.python.org/2/library/json.html). – Pierce Darragh Jul 12 '17 at 23:01
  • The `parse_int` option is only used for parts of the JSON that have the syntax of an integer. The double quotes make it a string, not an integer, so it doesn't use the `parse_int` option. – Barmar Jul 12 '17 at 23:07
  • @Barmar I'm a bit lost on that functionality. From all JSON I've worked with, `42` would be an int without `parse_int` and `"42"` would be a string. Do you have a link for a use-case on `parse_int`? – roganjosh Jul 12 '17 at 23:11
  • What you really need is a `parse_str` option, which would allow you to supply a custom function that returns an `int` if the string contains a valid integer. Unfortunately, that option doesn't exist. – Barmar Jul 12 '17 at 23:11
  • @roganjosh The documentation he linked to says "parse_int, if specified, will be called with the string of every JSON int to be decoded." So if the JSON contains `42`, the `parse_int` function will be called to parse it. But `"42"` is not a JSON int, so this option is not used for it. – Barmar Jul 12 '17 at 23:13
  • 2
    @roganjosh The documentation suggests this use case: **This can be used to use another datatype or parser for JSON integers (e.g. float).** – Barmar Jul 12 '17 at 23:14
  • @Barmar aha, I should have read further down. Your second comment clarifies, thanks. – roganjosh Jul 12 '17 at 23:15

5 Answers5

11

In addition to the Pierce response, I think you can use the json.loads object_hook parameter instead of cls one, so you don't need to walk the json object twice.

For example:

def _decode(o):
    # Note the "unicode" part is only for python2
    if isinstance(o, str) or isinstance(o, unicode):
        try:
            return int(o)
        except ValueError:
            return o
    elif isinstance(o, dict):
        return {k: _decode(v) for k, v in o.items()}
    elif isinstance(o, list):
        return [_decode(v) for v in o]
    else:
        return o

# Then you can do:
json.loads(c, object_hook=_decode)

As @ZhanwenChen pointed out in a comment, the code above is for python2. For python3 you'll need to remove the or isinstance(o, unicode) part in the first if condition.

juanra
  • 1,602
  • 19
  • 17
  • 1
    in [Python 3, the `str` class subsumed the `unicode` class](https://docs.python.org/3/howto/unicode.html#python-s-unicode-support), so your code would raise because `unicode` is an undefined variable. Please edit your answer. – Zhanwen Chen Feb 08 '19 at 15:09
  • In the dict comp, if you might want to also decode the key you can use `{_decode(k): ...}` – Anm Jun 16 '22 at 04:45
9

As we established in the comments, there is no existing functionality to do this for you. And I read through the documentation and some examples on the JSONDecoder and it also appears to not do what you want without processing the data twice.

The best option, then, is something like this:

class Decoder(json.JSONDecoder):
    def decode(self, s):
        result = super().decode(s)  # result = super(Decoder, self).decode(s) for Python 2.x
        return self._decode(result)

    def _decode(self, o):
        if isinstance(o, str) or isinstance(o, unicode):
            try:
                return int(o)
            except ValueError:
                return o
        elif isinstance(o, dict):
            return {k: self._decode(v) for k, v in o.items()}
        elif isinstance(o, list):
            return [self._decode(v) for v in o]
        else:
            return o

This has the downside of processing the JSON object twice — once in the super().decode(s) call, and again to recurse through the entire structure to fix things. Also note that this will convert anything which looks like an integer into an int. Be sure to account for this appropriately.

To use it, you do e.g.:

>>> c = '{"value": "42"}'
>>> json.loads(c, cls=Decoder)
{'value': 42}
Neuron
  • 5,141
  • 5
  • 38
  • 59
Pierce Darragh
  • 2,072
  • 2
  • 16
  • 29
  • Thank you Pierce you code seems right but it has some errors on `result = super().decode(s)` – Léo Jul 13 '17 at 17:09
  • @Léo I wrote this in Python 3; if you're using Python 2 you'd need `result = super(Decoder, self).decode(s)`. If that's not the issue, can you tell me what error you're seeing and I can try to help you? – Pierce Darragh Jul 13 '17 at 17:11
  • 1
    @Léo ah! I didn't realize it was doing unicode. I've updated my answer to accommodate the unicode handling, and it appears to work fine for me now! – Pierce Darragh Jul 13 '17 at 17:22
  • Well. You saved my Day! Thx a lot. :) – SkunKz Nov 13 '18 at 09:09
4

For my solution I used object_hook, which is useful when you have nested json

>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})

>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}

There is a filter only for parsing a json key to int. You can use int(v) if v.lstrip('-').isdigit() else v to filter for json values too.

Neuron
  • 5,141
  • 5
  • 38
  • 59
GooDeeJAY
  • 1,681
  • 2
  • 20
  • 27
0

In addition to @juanra and therefore @Pierce Darragh I added a conversion for boolean values from string. My example is a dict converted from XML that contains 'true' and 'false' that won't be loaded as JSON-boolean True and False automatically with the suggested answers.

def _decode(o):
    if isinstance(o, str):
        if o.lower() == 'true':
            return True
        elif o.lower() == 'false':
            return False
        else:
            try:
                return int(o)
            except ValueError:
                return o
    elif isinstance(o, dict):
        return {k: _decode(v) for k, v in o.items()}
    elif isinstance(o, list):
        return [_decode(v) for v in o]
    else:
        return o

According what you need you can also include other strings for boolean conversion with Converting from a string to boolean in Python?

vielfarbig
  • 121
  • 1
  • 7
-3
def convert_to_int(params):
    for key in params.keys():
        if isinstance(params[key], dict):
            convert_to_int(params[key])
        elif isinstance(params[key], list):
            for item in params[key]:
                if not isinstance(item, (dict, list)):
                    item = int(item)
                else:
                    convert_to_int(item)
        else:
            params[key] = int(params[key])
    return params


print convert_to_int({'a': '3', 'b': {'c': '4', 'd': {'e': 5}, 'f': [{'g': '6'}]}})
Yang
  • 215
  • 1
  • 3
  • 12
  • The issue with this is that the OP wanted to parse the value `"42"` into an `int` in Python, which your code does not account for. – Pierce Darragh Jul 13 '17 at 17:14
  • convert = lambda x: {x.keys()[0]: int(x.values()[0])} convert(json.loads('{"value": "42"}')) – Yang Jul 18 '17 at 06:02
  • Your lambda suggestion only works for dictionaries with exactly one value. It does not solve for multi-value dictionaries or arrays, and it also does not address the Unicode problem present in Python 2. Further, it isn't advisable to set the result of a `lambda` expression to a variable; just define a function and it becomes significantly easier to maintain. See my (accepted) answer for a more robust solution. – Pierce Darragh Jul 18 '17 at 06:07
  • how about this one? – Yang Jul 18 '17 at 06:34
  • Better, but `for key in params.keys()` assumes that `params` is a `dict`, which it may not be since you call the function recursively. Additionally, you iterate through the keys of the dictionary and then continuously do `params[key]`. Why not `for key, value in params.items()` (or `params.iteritems()` in Python 2.x)? Additionally, much of this can be done with comprehensions — like what I did in my solution. I think list/dict comprehensions lead to easier-to-read code in cases like this. – Pierce Darragh Jul 18 '17 at 06:39