2

I am importing and manipulating some deeply nested JSON (imported as a dictionary). It can assign the values just fine using code like:

query['query']['function_score']['query']['multi_match']['operator'] = 'or'
query['query']['function_score']['query']['multi_match'].update({
        'minimum_should_match' : '80%' })  

But it's ugly and cumbersome as nuts. I'm wondering if there's a cleaner way to assign values to deep-nested keys that's reasonably efficient?

I've read about possibly using an in-memory SQLlite db, but the data is going back into json after a bit of manipulation.

tarponjargon
  • 1,002
  • 10
  • 28

3 Answers3

4
multi_match = query['query']['function_score']['query']['multi_match']
multi_match['operator'] = 'or'
multi_match.update({'minimum_should_match' : '80%' })
宏杰李
  • 11,820
  • 2
  • 28
  • 35
3

JSONPath (via 'jsonpath_rw') makes it less cumbersome:

Previous:

>>> query
{u'query': {u'function_score': {u'query': {u'multi_match': {u'min_should_match': u'20%'}}}}}

Update:

>>> found = jsonpath_rw.parse("$..multi_match").find(query)[0]
>>> found.value["operator"] == "or"
>>> found.value["min_should_match"] = "80%"`

Afterwards:

>>> query
{u'query': {u'function_score': {u'query': {u'multi_match': {'min_should_match': '80%', u'operator': u'or'}}}}}
ma-ti
  • 126
  • 8
2

The chosen answer is definitely the way to go. The problem I (later) found is that my nested key can appear at an varying levels. So I needed to be able to traverse the dict and find the path to the node first, and THEN do the update or addition.

jsonpath_rw was the immediate solution, but I got some strange results trying to use it. I gave up after a couple hours of wrestling with it.

At the risk of getting shot down for being a clunky newb, I did end up fleshing out a few functions (based on other code I found on SO) that natively do some nice things to address my needs:

def find_in_obj(obj, condition, path=None):
    ''' generator finds full path to nested dict key when key is at an unknown level 
        borrowed from http://stackoverflow.com/a/31625583/5456148'''
    if path is None:
        path = []

    # In case this is a list
    if isinstance(obj, list):
        for index, value in enumerate(obj):
            new_path = list(path)
            new_path.append(index)
            for result in find_in_obj(value, condition, path=new_path):
                yield result

    # In case this is a dictionary
    if isinstance(obj, dict):
        for key, value in obj.items():
            new_path = list(path)
            new_path.append(key)
            for result in find_in_obj(value, condition, path=new_path):
                yield result

            if condition == key:
                new_path = list(path)
                new_path.append(key)
                yield new_path


def set_nested_value(nested_dict, path_list, key, value):
    ''' add or update a value in a nested dict using passed list as path
        borrowed from http://stackoverflow.com/a/11918901/5456148'''
    cur = nested_dict
    path_list.append(key)
    for path_item in path_list[:-1]:
        try:
            cur = cur[path_item]
        except KeyError:
            cur = cur[path_item] = {}

    cur[path_list[-1]] = value
    return nested_dict


def update_nested_dict(nested_dict, findkey, updatekey, updateval):
    ''' finds and updates values in nested dicts with find_in_dict(), set_nested_value()'''
    return set_nested_value(
        nested_dict,
        list(find_in_obj(nested_dict, findkey))[0],
        updatekey,
        updateval
    )

find_in_obj() is a generator that finds a path to a given nested key.

set_nested_values() will either update key/value in dict with given list or add it if it doesn't exist.

update_nested_dict() is a "wrapper" for the two functions that takes in the nested dict to search, the key you're looking for and the key value to update (or add if it doesn't exist).

So I can pass in:

q = update_nested_dict(q, 'multi_match', 'operator', 'or')
q = update_nested_dict(q, 'multi_match', 'minimum_should_match', '80%')

And the "operator" value is updated, and the 'minimum_should_match' key/value is added under the 'multi_match' node, no matter what level it appears in the dictionary.

Might run into problems if the searched key exists in more than 1 place in the dictionary though.

tarponjargon
  • 1,002
  • 10
  • 28