Parsing json and searching through it

Question

I have this code

import json
from pprint import pprint
json_data=open('bookmarks.json')
jdata = json.load(json_data)
pprint (jdata)
json_data.close()

How can I search through it for u'uri': u'http:?

I converted the JSON to Pandas and then used its filtering interface. Possible X/Y problem on my end: https://xyproblem.info/ — Subatomic Tripod, Jan 23 '22 at 19:56

score 34 · Answer 1 · answered Jan 05 '17 at 23:40

ObjectPath is a library that provides ability to query JSON and nested structures of dicts and lists. For example, you can search for all attributes called "foo" regardless how deep they are by using $..foo.

While the documentation focuses on the command line interface, you can perform the queries programmatically by using the package's Python internals. The example below assumes you've already loaded the data into Python data structures (dicts & lists). If you're starting with a JSON file or string you just need to use load or loads from the json module first.

import objectpath

data = [
    {'foo': 1, 'bar': 'a'},
    {'foo': 2, 'bar': 'b'},
    {'NoFooHere': 2, 'bar': 'c'},
    {'foo': 3, 'bar': 'd'},
]

tree_obj = objectpath.Tree(data)

tuple(tree_obj.execute('$..foo'))
# returns: (1, 2, 3)

Notice that it just skipped elements that lacked a "foo" attribute, such as the third item in the list. You can also do much more complex queries, which makes ObjectPath handy for deeply nested structures (e.g. finding where x has y that has z: $.x.y.z). I refer you to the documentation and tutorial for more information.

To note, objectpath uses https://goessner.net/articles/JsonPath/ for searching (or near enough to jsonpath), which is a mini language like xpath that you'll need to become familiar with. IMO @jro's recursive answer (the currently accepted one) is better as it walks the tree and allows you to execute arbitrary Python code against each element to find what you are looking for. — EoghanM, Dec 19 '18 at 11:16

jro · Accepted Answer · 2011-12-06T07:53:27.823

As json.loads simply returns a dict, you can use the operators that apply to dicts:

>>> jdata = json.load('{"uri": "http:", "foo", "bar"}')
>>> 'uri' in jdata       # Check if 'uri' is in jdata's keys
True
>>> jdata['uri']         # Will return the value belonging to the key 'uri'
u'http:'

Edit: to give an idea regarding how to loop through the data, consider the following example:

>>> import json
>>> jdata = json.loads(open ('bookmarks.json').read())
>>> for c in jdata['children'][0]['children']:
...     print 'Title: {}, URI: {}'.format(c.get('title', 'No title'),
                                          c.get('uri', 'No uri'))
...
Title: Recently Bookmarked, URI: place:folder=BOOKMARKS_MENU(...)
Title: Recent Tags, URI: place:sort=14&type=6&maxResults=10&queryType=1
Title: , URI: No uri
Title: Mozilla Firefox, URI: No uri

Inspecting the jdata data structure will allow you to navigate it as you wish. The pprint call you already have is a good starting point for this.

Edit2: Another attempt. This gets the file you mentioned in a list of dictionaries. With this, I think you should be able to adapt it to your needs.

>>> def build_structure(data, d=[]):
...     if 'children' in data:
...         for c in data['children']:
...             d.append({'title': c.get('title', 'No title'),
...                                      'uri': c.get('uri', None)})
...             build_structure(c, d)
...     return d
...
>>> pprint.pprint(build_structure(jdata))
[{'title': u'Bookmarks Menu', 'uri': None},
 {'title': u'Recently Bookmarked',
  'uri':   u'place:folder=BOOKMARKS_MENU&folder=UNFILED_BOOKMARKS&(...)'},
 {'title': u'Recent Tags',
  'uri':   u'place:sort=14&type=6&maxResults=10&queryType=1'},
 {'title': u'', 'uri': None},
 {'title': u'Mozilla Firefox', 'uri': None},
 {'title': u'Help and Tutorials',
  'uri':   u'http://www.mozilla.com/en-US/firefox/help/'},
 (...)
}]

To then "search through it for u'uri': u'http:'", do something like this:

for c in build_structure(jdata):
    if c['uri'].startswith('http:'):
        print 'Started with http'

It ", line 3, in ValueError: zero length field name in format when i try to start the second example — BKovac, Dec 05 '11 at 12:00
That is probably related to the layout of the bookmarks you exported... I don't really know the format, but I'd guess it makes a `children` key for every folder or container you have in your bookmarks. Try it for example with `for c in jdata['children']:` instead of the above. Also, note that the `'{}'.format()` function is new in Python 2.6... you might have an older version. If so, replace that line with `print 'Title: %s, URI: %s' % (c.get('title', 'No title'), c.get('uri', 'No uri'))`. — jro, Dec 05 '11 at 12:21

score 1 · Answer 3 · edited Aug 14 '20 at 09:53

1

Seems there's a typo (missing colon) in the JSON dict provided by jro.

The correct syntax would be:

jdata = json.load('{"uri": "http:", "foo": "bar"}')

This cleared it up for me when playing with the code.

edited Aug 14 '20 at 09:53

sɐunıɔןɐqɐp

3,332
15
36
40

answered Apr 02 '17 at 18:20

PythonPadawan

33
6

Van4ozA · Answer 4 · 2017-06-10T20:56:57.340

Functions to search through and print dicts, like JSON. *made in python 3

Search:

def pretty_search(dict_or_list, key_to_search, search_for_first_only=False):
    """
    Give it a dict or a list of dicts and a dict key (to get values of),
    it will search through it and all containing dicts and arrays
    for all values of dict key you gave, and will return you set of them
    unless you wont specify search_for_first_only=True

    :param dict_or_list: 
    :param key_to_search: 
    :param search_for_first_only: 
    :return: 
    """
    search_result = set()
    if isinstance(dict_or_list, dict):
        for key in dict_or_list:
            key_value = dict_or_list[key]
            if key == key_to_search:
                if search_for_first_only:
                    return key_value
                else:
                    search_result.add(key_value)
            if isinstance(key_value, dict) or isinstance(key_value, list) or isinstance(key_value, set):
                _search_result = pretty_search(key_value, key_to_search, search_for_first_only)
                if _search_result and search_for_first_only:
                    return _search_result
                elif _search_result:
                    for result in _search_result:
                        search_result.add(result)
    elif isinstance(dict_or_list, list) or isinstance(dict_or_list, set):
        for element in dict_or_list:
            if isinstance(element, list) or isinstance(element, set) or isinstance(element, dict):
                _search_result = pretty_search(element, key_to_search, search_result)
                if _search_result and search_for_first_only:
                    return _search_result
                elif _search_result:
                    for result in _search_result:
                        search_result.add(result)
    return search_result if search_result else None

Print:

def pretty_print(dict_or_list, print_spaces=0):
    """
    Give it a dict key (to get values of),
    it will return you a pretty for print version
    of a dict or a list of dicts you gave.

    :param dict_or_list: 
    :param print_spaces: 
    :return: 
    """
    pretty_text = ""
    if isinstance(dict_or_list, dict):
        for key in dict_or_list:
            key_value = dict_or_list[key]
            if isinstance(key_value, dict):
                key_value = pretty_print(key_value, print_spaces + 1)
                pretty_text += "\t" * print_spaces + "{}:\n{}\n".format(key, key_value)
            elif isinstance(key_value, list) or isinstance(key_value, set):
                pretty_text += "\t" * print_spaces + "{}:\n".format(key)
                for element in key_value:
                    if isinstance(element, dict) or isinstance(element, list) or isinstance(element, set):
                        pretty_text += pretty_print(element, print_spaces + 1)
                    else:
                        pretty_text += "\t" * (print_spaces + 1) + "{}\n".format(element)
            else:
                pretty_text += "\t" * print_spaces + "{}: {}\n".format(key, key_value)
    elif isinstance(dict_or_list, list) or isinstance(dict_or_list, set):
        for element in dict_or_list:
            if isinstance(element, dict) or isinstance(element, list) or isinstance(element, set):
                pretty_text += pretty_print(element, print_spaces + 1)
            else:
                pretty_text += "\t" * print_spaces + "{}\n".format(element)
    else:
        pretty_text += str(dict_or_list)
    if print_spaces == 0:
        print(pretty_text)
    return pretty_text

number5 · Answer 5 · 2017-09-04T02:17:16.093

-2

You can use jsonpipe if you just need the output (and more comfortable with command line):

cat bookmarks.json | jsonpipe |grep uri

edited Sep 04 '17 at 02:17

answered Dec 05 '11 at 11:52

number5

15,913
3
54
51

jsonpipe link seems to be changed or removed – Suresh Prajapati Sep 01 '17 at 13:56
@SureshPrajapati fixed – number5 Sep 04 '17 at 02:17
This is not a python solution. – Kartoch Aug 14 '20 at 08:59

Parsing json and searching through it

5 Answers5

Linked

Related