2

I have a Python object with multiple layers of dicts and lists that contain keys that I need to get the values from. I found an answer using recursive generators that will allow me to pull the value of one key, but not multiple keys. Here's the code:

with open('data.json') as f:
    json_data = json.load(f)

def find_key(obj, key):
    if isinstance(obj, dict):
        yield from iter_dict(obj, key, [])
    elif isinstance(obj, list):
        yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
    for k, v in d.items():
        if k == key:
            yield indices + [k], v
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
    for k, v in enumerate(seq):
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])


for c in find_key(json_data, 'customer_count'):
    print(c)

Result:

(['calendar', 'weeks', 0, 'days', 1, 'availabilities', 0, 'customer_count'], 14)
(['calendar', 'weeks', 0, 'days', 2, 'availabilities', 0, 'customer_count'], 7)

Another post has an example to extract multiple keys, but doesn't recurse through the entire object:

[...]
keys = ("customer_count", "utc_start_at", "non_resource_bookable_capacity")
for k in keys:
    keypath, val = next(find_key(json_data, k))
    print("{!r}: {!r}".format(k, val))

Result:

'customer_count': 14
'utc_start_at': '2018-09-29T16:45:00+0000'
'non_resource_bookable_capacity': 18

How do I iterate through the entire object and extract the three keys shown above?

My desired result would look something like this:

'customer_count': 14
'utc_start_at': '2018-09-29T16:45:00+0000'
'non_resource_bookable_capacity': 18

'customer_count': 7
'utc_start_at': '2018-09-29T16:45:00+0000'
'non_resource_bookable_capacity': 25

sample json

benvc
  • 14,448
  • 4
  • 33
  • 54
TomAudre
  • 67
  • 2
  • 6
  • 1
    so whats your desired result ? can you edit and show how your final result should look like. – Tanmay jain Sep 28 '18 at 17:13
  • Possible duplicate of [Get multiple keys from json in Python](https://stackoverflow.com/questions/45334930/get-multiple-keys-from-json-in-python) – stovfl Sep 28 '18 at 17:40

1 Answers1

1

The example function below searches a dict (including all nested dicts) for key / value pairs matching a list of keys you would like to find. This function recursively loops through the dict and any nested dicts and lists it contains to build a list of all possible dicts to be checked for matching keys.

def find_key_value_pairs(q, keys, dicts=None):
    if not dicts:
        dicts = [q]
        q = [q]  

    data = q.pop(0)
    if isinstance(data, dict):
        data = data.values()

    for d in data:
        dtype = type(d)
        if dtype is dict or dtype is list:
            q.append(d)
            if dtype is dict:
                dicts.append(d)

    if q:
        return find_key_value_pairs(q, keys, dicts)

    return [(k, v) for d in dicts for k, v in d.items() if k in keys]

Example below uses json.loads to convert an example dataset similar to your json to a dict before passing it to the function.

import json

json_data = """
{"results_count": 2, "results": [{"utc_start_at": "2018-09-29T16:45:00+0000", "counts": {"customer_count": "14", "other_count": "41"}, "capacity": {"non-resource": {"non_resource_bookable_capacity": "18", "other_non_resource_capacity": "1"}, "resource_capacity": "10"}}, {"utc_start_at": "2018-10-29T15:15:00+0000", "counts": {"customer_count": "7", "other_count": "41"}, "capacity": {"non-resource": {"non_resource_bookable_capacity": "25", "other_non_resource_capacity": "1"}, "resource_capacity": "10"}}]}
"""
data = json.loads(json_data) # json_data is a placeholder for your json
keys = ['results_count', 'customer_count', 'utc_start_at', 'non_resource_bookable_capacity']
results = find_key_value_pairs(data, keys)
for k, v in results:
    print(f'{k}: {v}')
# results_count: 2
# utc_start_at: 2018-09-29T16:45:00+0000
# utc_start_at: 2018-10-29T15:15:00+0000
# customer_count: 14
# customer_count: 7
# non_resource_bookable_capacity: 18
# non_resource_bookable_capacity: 25
benvc
  • 14,448
  • 4
  • 33
  • 54
  • I recently noticed the function is randomizing the order of the elements in the new list (results). Can you think of any reason this would be happening or how to ensure elements are in a consistent order? Updated [script](https://zerobin.net/?d359b6afd5b2509c#Z5Lb8jbAL5703YcGS2ZFZUDLm2B3NcRtIgUewa3FNHg=) – TomAudre Oct 05 '18 at 23:45
  • @TomAudre - it is probably because the original function worked through the lists passed to `search_queue` backwards (just because I did not put much thought into preserving order with this function). That said, I edited the answer to work through the list queues forwards which I think should get you the order you are looking for (but I am not 100% sure). – benvc Oct 06 '18 at 00:27
  • Thanks for the update. I think there's something odd with my environment. On OS X (python 3.7) I get consistent order in the elements. On my Debian 8 server (python 3.4.2) I'm still getting random order. Thanks again for taking another look. – TomAudre Oct 06 '18 at 00:57
  • 2
    @TomAudre That's because before python 3.6 dicts were unordered. – misantroop May 30 '19 at 10:58