39

There is a JSON like this:

{
  "P1": "ss",
  "Id": 1234,
  "P2": {
      "P1": "cccc"
  },
  "P3": [
      {
          "P1": "aaa"
      }
  ]
}

How can I find all P1's value without it iterating all JSON?

P.S.: P1 can be anywhere in the JSON.

If no method can do this, can you tell me how to iterate through the JSON?

martineau
  • 119,623
  • 25
  • 170
  • 301
lichengwu
  • 4,277
  • 6
  • 29
  • 42
  • 3
    If it can be anywhere in the set of nested structures, then you're going to have to look through all of it. That's just how reality works. – Amber Dec 27 '12 at 03:18
  • Maybe this is what you need? https://www.p6r.com/articles/2008/05/06/xslt-and-xpath-for-json/ – Roman Newaza Dec 27 '12 at 03:22
  • see my json2xml solution at https://stackoverflow.com/questions/38361224/python-json-load-returning-string-instead-of-dictionary/64575834#64575834 – Golden Lion Oct 28 '20 at 15:20

9 Answers9

33

As I said in my other answer, I don't think there is a way of finding all values associated with the "P1" key without iterating over the whole structure. However I've come up with even better way to do that which came to me while looking at @Mike Brennan's answer to another JSON-related question How to get string objects instead of Unicode from JSON?

The basic idea is to use the object_hook parameter that json.loads() accepts just to watch what is being decoded and check for the sought-after value.

Note: This will only work if the representation is of a JSON object (i.e. something enclosed in curly braces {}), as in your sample.

from __future__ import print_function
import json

def find_values(id, json_repr):
    results = []

    def _decode_dict(a_dict):
        try:
            results.append(a_dict[id])
        except KeyError:
            pass
        return a_dict

    json.loads(json_repr, object_hook=_decode_dict) # Return value ignored.
    return results

json_repr = '{"P1": "ss", "Id": 1234, "P2": {"P1": "cccc"}, "P3": [{"P1": "aaa"}]}'
print(find_values('P1', json_repr))

(Python 3) output:

['cccc', 'aaa', 'ss']
martineau
  • 119,623
  • 25
  • 170
  • 301
15

I had the same issue just the other day. I wound up just searching through the entire object and accounted for both lists and dicts. The following snippets allows you to search for the first occurrence of a multiple keys.

import json

def deep_search(needles, haystack):
    found = {}
    if type(needles) != type([]):
        needles = [needles]

    if type(haystack) == type(dict()):
        for needle in needles:
            if needle in haystack.keys():
                found[needle] = haystack[needle]
            elif len(haystack.keys()) > 0:
                for key in haystack.keys():
                    result = deep_search(needle, haystack[key])
                    if result:
                        for k, v in result.items():
                            found[k] = v
    elif type(haystack) == type([]):
        for node in haystack:
            result = deep_search(needles, node)
            if result:
                for k, v in result.items():
                    found[k] = v
    return found

deep_search(["P1", "P3"], json.loads(json_string))

It returns a dict with the keys being the keys searched for. Haystack is expected to be a Python object already, so you have to do json.loads before passing it to deep_search.

Any comments for optimization are welcomed!

Sean Linehan
  • 288
  • 1
  • 8
  • I know that this is an old answer but I just wanted to say that I adapted your solution by checking `len(needles) == len(found)` in both loops to cut the execution short in the case that I had already found all keys. – vonludi Jan 20 '18 at 09:22
  • Awesome answer. Just what I was looking for. – Inaam Ilahi Oct 20 '22 at 09:17
11

My approach to this problem would be different.

As JSON doesn't allow depth first search, so convert the json to a Python Object, feed it to an XML decoder and then extract the Node you are intending to search

from xml.dom.minidom import parseString
import json        
def bar(somejson, key):
    def val(node):
        # Searches for the next Element Node containing Value
        e = node.nextSibling
        while e and e.nodeType != e.ELEMENT_NODE:
            e = e.nextSibling
        return (e.getElementsByTagName('string')[0].firstChild.nodeValue if e 
                else None)
    # parse the JSON as XML
    foo_dom = parseString(xmlrpclib.dumps((json.loads(somejson),)))
    # and then search all the name tags which are P1's
    # and use the val user function to get the value
    return [val(node) for node in foo_dom.getElementsByTagName('name') 
            if node.firstChild.nodeValue in key]

bar(foo, 'P1')
[u'cccc', u'aaa', u'ss']
bar(foo, ('P1','P2'))
[u'cccc', u'cccc', u'aaa', u'ss']
martineau
  • 119,623
  • 25
  • 170
  • 301
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • 1
    What is `xmlrpclib`? I think python 3 doesn't support it. I am getting error. – Jaffer Wilson Mar 16 '17 at 07:57
  • @JafferWilson: The [Python 2 documentation](https://docs.python.org/2/library/xmlrpclib.html) says "The `xmlrpclib` module has been renamed to `xmlrpc.client` in Python 3." The Python 3 documentation for it being [here](https://docs.python.org/3/library/xmlrpc.client.html). – martineau Nov 07 '17 at 13:12
  • unfortunately, it seems to be working only when the value is string or I am doing something wrong. – viveksinghggits Dec 04 '18 at 13:40
  • @viveksinghggits: In python there is no json data type. You have to pass the json as string. – Abhijit Dec 04 '18 at 23:22
  • @Abhijit I agree but what I meant was `bar(foo, 'Id')` is resulting in an error, because the value for key `Id` is int. – viveksinghggits Dec 05 '18 at 04:19
  • @viveksinghggits: FWIW, the code in [my answer](https://stackoverflow.com/a/14059645/355230) doesn't care what the type the key value is (nor require converting the JSON to XML). – martineau Sep 15 '21 at 13:23
10

Using json to convert the json to Python objects and then going through recursively works best. This example does include going through lists.

import json
def get_all(myjson, key):
    if type(myjson) == str:
        myjson = json.loads(myjson)
    if type(myjson) is dict:
        for jsonkey in myjson:
            if type(myjson[jsonkey]) in (list, dict):
                get_all(myjson[jsonkey], key)
            elif jsonkey == key:
                print myjson[jsonkey]
    elif type(myjson) is list:
        for item in myjson:
            if type(item) in (list, dict):
                get_all(item, key)
jdotjdot
  • 16,134
  • 13
  • 66
  • 118
6

Converting the JSON to Python and recursively searching is by far the easiest:

def findall(v, k):
  if type(v) == type({}):
     for k1 in v:
         if k1 == k:
            print v[k1]
         findall(v[k1], k)

findall(json.loads(a), 'P1')

(where a is the string)

The example code ignores arrays. Adding that is left as an exercise.

Michael Lorton
  • 43,060
  • 26
  • 103
  • 144
5

Bearing in mind that json is simply a string, using regular expressions with look-ahead and look-behind can accomplish this task very quickly.

Typically, the json would have been extracted from a request to external api, so code to show how that would work has been included but commented out.

import re
#import requests
#import json

#r1 = requests.get( ... url to some api ...)
#JSON = str(json.loads(r1.text))
JSON = """
 {
  "P1": "ss",
  "Id": 1234,
  "P2": {
      "P1": "cccc"
  },
  "P3": [
     {
          "P1": "aaa"
     }
  ]
 }
"""
rex1  = re.compile('(?<=\"P1\": \")[a-zA-Z_\- ]+(?=\")')
rex2 = rex1.findall(JSON)  
print(rex2)

#['ss', 'cccc', 'aaa']
Tony Mobbs
  • 151
  • 3
  • 8
3

I don't think there's any way of finding all values associated with P1 without iterating over the whole structure. Here's a recursive way to do it that first deserializes the JSON object into an equivalent Python object. To simplify things most of the work is done via a recursive private nested function.

import json

try:
    STRING_TYPE = basestring
except NameError:
    STRING_TYPE = str  # Python 3

def find_values(id, obj):
    results = []

    def _find_values(id, obj):
        try:
            for key, value in obj.items():  # dict?
                if key == id:
                    results.append(value)
                elif not isinstance(value, STRING_TYPE):
                    _find_values(id, value)
        except AttributeError:
            pass

        try:
            for item in obj:  # iterable?
                if not isinstance(item, STRING_TYPE):
                    _find_values(id, item)
        except TypeError:
            pass

    if not isinstance(obj, STRING_TYPE):
        _find_values(id, obj)
    return results

json_repr = '{"P1": "ss", "Id": 1234, "P2": {"P1": "cccc"}, "P3": [{"P1": "aaa"}]}'

obj = json.loads(json_repr)
print(find_values('P1', obj))
martineau
  • 119,623
  • 25
  • 170
  • 301
1

You could also use a generator to search the object after json.load().

Code example from my answer here: https://stackoverflow.com/a/39016088/5250939

def item_generator(json_input, lookup_key):
    if isinstance(json_input, dict):
        for k, v in json_input.iteritems():
            if k == lookup_key:
                yield v
            else:
                for child_val in item_generator(v, lookup_key):
                    yield child_val
    elif isinstance(json_input, list):
        for item in json_input:
            for item_val in item_generator(item, lookup_key):
                yield item_val
Community
  • 1
  • 1
Bo Sunesen
  • 911
  • 1
  • 7
  • 9
0

The question is old, but no answer answered 100%, so this was my solution:

what it does:

  • recursive algorithm;
  • list search;
  • object search;
  • returns all the results it finds in the tree;
  • returns the id of the parent in the key

suggestions:

  • study Depth First Search and Breadth First Search;
  • if your json is too big, recursion may be a problem, research stack algorithm
   @staticmethod
    def search_into_json_myversion(jsondata, searchkey, parentkeyname: str = None) -> list:
        found = []

        if type(jsondata) is list:
            for element in jsondata:
                val = Tools.search_into_json_myversion(element, searchkey, parentkeyname=parentkeyname)
                if len(val) != 0:
                    found = found + val
        elif type(jsondata) is dict:
            if searchkey in jsondata.keys():
                pathkey = parentkeyname + '->' + searchkey if parentkeyname != None else searchkey
                found.append({pathkey: jsondata[searchkey]})
            else:
                for key, value in jsondata.items():
                    val = Tools.search_into_json_myversion(value, searchkey, parentkeyname=key)
                    if len(val) != 0:
                        found = found + val

        return found