3

I have downloaded a json data from a website, and I want to select specific key:values from a nested json. I converted the json to python dictionary. Then I used dictionary comprehension to select the nested key:values , however there are too many nests and I am sure there is a better way than expanding every dictionary separately. I see redundancy in my method. Can you please suggest a better method?

{
    "success": true,
    "payload": {
        "tag": {
            "slug": "python",
            "name": "Python",
            "postCount": 10590,
            "virtuals": {
                "isFollowing": false
            }
        },
        "metadata": {
            "followerCount": 18053,
            "postCount": 10590,
            "coverImage": {
                "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif",
                "originalWidth": 550,
                "originalHeight": 300
            }
        }
    }
}    

My Method:

from datetime import datetime,timedelta

import json,re

data=r'data.json'
#reads json and converts to dictionary
def js_r(data):
    with open(data, encoding='Latin-1') as f_in:
        return json.load(f_in)

def find_key(obj, key):
    if isinstance(obj, dict):
        yield from iter_dict(obj, key, [])
    elif isinstance(obj, list):
        yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
    for k, v in d.items():
        if k == key:
            yield indices + [k], v
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
    for k, v in enumerate(seq):
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])
if __name__=="__main__":
    my_dict=js_r(data)
    print ( "This is dictionary for python tag",my_dict)
    keys=my_dict.keys()
    print ("This is the dictionary keys",my_dict.keys())
    my_payload=list(find_key(my_dict,'title'))
    print ("These are my payload",my_payload)
    my_post=iter_dict(my_dict,'User','id')
    print(list(my_post))
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
Kaleab Woldemariam
  • 2,567
  • 4
  • 22
  • 43
  • 1
    You may find my code here of interest: http://stackoverflow.com/q/41777880/4014959 – PM 2Ring Oct 31 '17 at 14:02
  • @PM 2Ring If I give the function the nested key i know , it would give me the dictionary nested within it? I apologize if this is a trivial question. – Kaleab Woldemariam Oct 31 '17 at 14:17
  • @wwii Perhaps. To be honest, I'm not totally clear on what Kaleab is doing. Does he really want to create `payload_dict` & `paging_dict` for future use? Or is he only creating them because he thinks he has to, to get at the data he wants? – PM 2Ring Oct 31 '17 at 14:19
  • I suggest playing around with the code I linked and see if it does what you want. Also take a look at https://stackoverflow.com/questions/46700975/how-to-modify-the-key-of-a-nested-json – PM 2Ring Oct 31 '17 at 14:20
  • @PM 2Ring Actually, my intention is to get to the bottom of the nest, payload_dict and paging_dict are not the end result,I want to get user key further down which is why i thought it was redundant way. – Kaleab Woldemariam Oct 31 '17 at 14:22
  • In that case, my code should do what you want, and we can close this one as a duplicate. – PM 2Ring Oct 31 '17 at 14:25
  • "If I give the function the nested key i know , it would give me the dictionary nested within it?". Yes. However, `find_key` is designed to handle structures that may have the same key at several places, so it creates an iterator of all the solutions. When you loop over that iterator each loop gives you a list of keys / indices as well as the associated value. – PM 2Ring Oct 31 '17 at 14:36
  • If you don't know how deep the dictionary goes, the only way of solving this is with recursion as per @PM2Ring's code. – Carlo Mazzaferro Oct 31 '17 at 15:02
  • @PM 2Ring Yes, I am calling the json data at the top. – Kaleab Woldemariam Nov 02 '17 at 12:18
  • @PM 2Ring I think the json data is valid (because) it came from an API. I edited the posted data.json for brevity. – Kaleab Woldemariam Nov 02 '17 at 12:35
  • Well, it wasn't valid: it gave errors when I tried to pass it to `json.loads`. I had to add several braces to it to make it valid. But anyway, now that I've had a look at your code I don't understand what you're trying to do. There's no need to call the `iter_dict` function directly, you should call the `find_key` generator. However, it looks like you are trying to find the values for the 'title', 'User', and 'id' keys. But that JSON data doesn't have any 'title' or 'User' keys. – PM 2Ring Nov 02 '17 at 12:43
  • @PM 2Ring Sorry, for being rigid. Well, I removed those 'title' and 'User' keys for brevity, they exist in in the original data. But, I want to understand the purpose of iter_dict() and iter_list() functions, and what they yield. – Kaleab Woldemariam Nov 02 '17 at 12:52
  • @PM 2 Ring. My sincere apologies, on second thought, the data does not have 'id' index with the 'User' key. What is the purpose of the indices argument of iter_dict()? Again, my apologies for unduly lengthy commenting. – Kaleab Woldemariam Nov 02 '17 at 13:00
  • As I said before don't worry about calling `iter_dict` directly, let `find_keys` call it for you. But to answer your question, the `indices` arg in `iter_dict` and `iter_list` is used to gather the `dict` keys and `list` indices that those functions find as they descend into the nested data. – PM 2Ring Nov 02 '17 at 13:12
  • The code I posted shows how to find a single key, or a list or tuple of several different keys. If there are multiple items in the JSON that have the same key that code will only the first matching item, but it's easy to find them all if you need that. If you need me to show you how that's done you'll need to give me some appropriate data. – PM 2Ring Nov 02 '17 at 13:15
  • I've just added a "How it works" section to my answer to [Functions that help to understand json(dict) structure](some more explanation to my answer to "Functions that help to understand json(dict) structure"). I hope you find it helpful. – PM 2Ring Nov 02 '17 at 15:56
  • @PM 2Ring Thank you very much for your dedicated answer. – Kaleab Woldemariam Nov 02 '17 at 16:01
  • No worries. We got there eventually. ;) – PM 2Ring Nov 02 '17 at 16:02

2 Answers2

1

Here's how you use my find_keys generator from Functions that help to understand json(dict) structure to get the 'id' value from that JSON data, and a few other keys I chose at random. This code gets the JSON data from a string rather than reading it from a file.

import json

json_data = '''\
{
    "success": true,
    "payload": {
        "tag": {
            "slug": "python",
            "name": "Python",
            "postCount": 10590,
            "virtuals": {
                "isFollowing": false
            }
        },
        "metadata": {
            "followerCount": 18053,
            "postCount": 10590,
            "coverImage": {
                "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif",
                "originalWidth": 550,
                "originalHeight": 300
            }
        }
    }
}
'''

data = r'data.json'

#def js_r(data):
    #with open(data, encoding='Latin-1') as f_in:
        #return json.load(f_in)

# Read the JSON from the inline json_data string instead of from the data file
def js_r(data):
    return json.loads(json_data)

def find_key(obj, key):
    if isinstance(obj, dict):
        yield from iter_dict(obj, key, [])
    elif isinstance(obj, list):
        yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
    for k, v in d.items():
        if k == key:
            yield indices + [k], v
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
    for k, v in enumerate(seq):
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

if __name__=="__main__":
    # Read the JSON data
    my_dict = js_r(data)
    print("This is the JSON data:")
    print(json.dumps(my_dict, indent=4), "\n")

    # Find the id key
    keypath, val = next(find_key(my_dict, "id"))
    print("This is the id: {!r}".format(val))
    print("These are the keys that lead to the id:", keypath, "\n")

    # Find the name, followerCount, originalWidth, and originalHeight
    print("Here are some more (key, value) pairs")
    keys = ("name", "followerCount", "originalWidth", "originalHeight")
    for k in keys:
        keypath, val = next(find_key(my_dict, k))
        print("{!r}: {!r}".format(k, val))

output

This is the JSON data:
{
    "success": true,
    "payload": {
        "tag": {
            "slug": "python",
            "name": "Python",
            "postCount": 10590,
            "virtuals": {
                "isFollowing": false
            }
        },
        "metadata": {
            "followerCount": 18053,
            "postCount": 10590,
            "coverImage": {
                "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif",
                "originalWidth": 550,
                "originalHeight": 300
            }
        }
    }
} 

This is the id: '1*O3-jbieSsxcQFkrTLp-1zw.gif'
These are the keys that lead to the id: ['payload', 'metadata', 'coverImage', 'id'] 

Here are some more (key, value) pairs
'name': 'Python'
'followerCount': 18053
'originalWidth': 550
'originalHeight': 300

BTW, JSON normally uses a UTF encoding, not Latin-1. The default encoding is UTF-8, you should be using that, if possible.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
1

I suggest you to use python-benedict, a solid python dict subclass with full keypath support and many utility methods.

It provides IO support with many formats, including json.

You can initialize it directly from the json file:

from benedict import benedict

d = benedict.from_json('data.json')

Now your dict has keypath support:

print(d['payload.metadata.coverImage.id'])

# or use get to avoid a possible KeyError
print(d.get('payload.metadata.coverImage.id'))

Installation: pip install python-benedict

Here the library repository and the documentation: https://github.com/fabiocaccamo/python-benedict

Note: I am the author of this project

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Fabio Caccamo
  • 1,871
  • 19
  • 21