2

I have been looking at the answer to the following question here: How can I select deeply nested key:values from dictionary in python

But my issue isn't in finding a single key inside the deeply nested data structure, but all occurences of a particular key.

For example, like if we modify the data structure in the first example in here:

[ "stats":{ "success": true, "payload": { "tag": { "slug": "python", "name": "Python", "postCount": 10590, "virtuals": { "isFollowing": false } }, "metadata": { "followerCount": 18053, "postCount": 10590, "coverImage": { "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif", "originalWidth": 550, "originalHeight": 300 } } } }, "stats": { "success": true, "payload": { "tag": { "slug": "python", "name": "Python", "postCount": 10590, "virtuals": { "isFollowing": false } }, "metadata": { "followerCount": 18053, "postCount": 10590, "coverImage": { "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif", "originalWidth": 550, "originalHeight": 300 } } } } ]

How would I get every possible occurrences of "metadata" here?

Yazanator
  • 127
  • 8

2 Answers2

3

How about something recursive?

def extractVals(obj, key, resList):
    if type(obj) == dict:
        if key in obj:
            resList.append(obj[key])
        for k, v in obj.items():
            extractVals(v, key, resList)
    if type(obj) == list:
        for l in obj:
            extractVals(l, key, resList)

resultList1 = []    
extractVals(dat, 'metadata', resultList1)
print(resultList1)

yields:

[{'coverImage': {'id': '1*O3-jbieSsxcQFkrTLp-1zw.gif',
 'originalHeight': 300,
 'originalWidth': 550},
 'followerCount': 18053,
 'postCount': 10590},
{'coverImage': {'id': '1*O3-jbieSsxcQFkrTLp-1zw.gif',
 'originalHeight': 300,
 'originalWidth': 550},
 'followerCount': 18053,
 'postCount': 10590}]

I also had to modify your dataset slightly above to be a valid Python structure. true -> True, false -> False, and removed the keys from the top level list.

Joshua R.
  • 2,282
  • 1
  • 18
  • 21
1

You can use a custon class like this one:

class  DeepDict:

    def __init__(self, data):
        self.data = data

    @classmethod
    def _deep_find(cls, data, key, root, response):
        if root:
            root += "."
        if isinstance(data, list):
            for i, item in enumerate(data):
                cls._deep_find(item, key, root + str(i), response)
        elif isinstance(data, dict):
            if key in data:
                response.append(root + key)
            for data_key, value in data.items():
                cls._deep_find(value, key, root + data_key, response)
        return response

    def deep_find(self, key):
        """ Returns all ocurrences of `key` with a dottedpath leading to each.
        Use  `deepget` to retrieve the values for a given ocurrence, or
        `get_all` to iterate over the values for each occurrence of the key.
        """
        return self._deep_find(self.data, key, root="", response=[])

    @classmethod
    def _deep_get(cls, data, path):
        if not path:
            return data
        index = path.pop(0)
        if index.isdigit():
            index = int(index)
        return cls._deep_get(data[index], path)

    def deep_get(self, path):
        if isinstance(path, str):
            path = path.split(".")
        return self._deep_get(self.data, path)

    def get_all(self, key):
        for path in self.deep_find(key):
            yield self.deep_get(path)

    def __getitem__(self, key):
        if key.isdigit():
            key = int(key)
        return self.data[key]

(Note that although I named it "DeepDict" it is actually a generic JSON container that will work with both lists and dicts as outer elements. BTW, the JSON fragment in your question is broken - both "stats": keys should be wrapped in an extra { })

So, these three custom methods can either find you the precise "path" to each occurrence of a key, or, you can use the get_all method to simply get the contents of how many keys with that name are in the structure as an iterator.

With the above class, after fixing your data I did:

data = DeepDict(<data structure above (fixed)>)
list(data.get_all("metadata"))

and got as output:

[{'coverImage': {'id': '1*O3-jbieSsxcQFkrTLp-1zw.gif',
   'originalHeight': 300,
   'originalWidth': 550},
  'followerCount': 18053,
  'postCount': 10590},
 {'coverImage': {'id': '1*O3-jbieSsxcQFkrTLp-1zw.gif',
   'originalHeight': 300,
   'originalWidth': 550},
  'followerCount': 18053,
  'postCount': 10590}]
jsbueno
  • 99,910
  • 10
  • 151
  • 209