1

Say I have a nested dictionary like so:

{
    'a': {
        'z': [
            {
                'x': 1,
                'y': [
                    [{'a': 10}, {'f': 21}],
                    {'m': 100},
                ]
            }
        ]
    }
    'm': {
        'x': 100
    }
}

And let's say I have a list of paths that have no relative order to each other:

[('a', 'z', 'x'), ('m', 'x'), ('a', 'z', 'y', 'a'), ...]

And I want to update whatever value the final key is for all these paths to some value val. Notice that in the third tuple, the last key is a, but it sits in a list.

In my data, these assumptions can be made:

  • key names can be repeated in a single path (notice 'a' is the source and destination node in the third tuple)
  • destination "nodes" are always a string, int, float, boolean, i.e. - never ending on a list or another dictionary
  • lists only hold other lists or dictionaries

My thoughts:

  1. My current implementation is a brute force approach which is too slow for my requirements: iterate from source to destination for every single one of these paths, detecting if I am on a list or dictionary.
  2. Ideally, I would back track and visit nearby nodes that need to be visited as well, so that I don't have to start from the top level again. This kind of reminds me of visiting multiple stops/cities on a trip.
  3. I was also thinking of a trie, but I think that would still mean starting from source to destination. Also I don't think a trie would work since we have lists.

Any thoughts?

TheRealFakeNews
  • 7,512
  • 16
  • 73
  • 114
  • It will be inefficient to have to scan lists in order to find a certain key/value. Moreover, there might be multiple list elements that match. Is it not possible to include in your paths the *index* of the intended list element? The third example would then have to be `('a', 'z', 'y', 0, 'a')`. That would make a solution efficient and unambiguous. – trincot Aug 04 '22 at 06:59
  • Should a solution like this - https://stackoverflow.com/a/73144408/5387738 suit you, let me know. We could try to build something similar. – westandskif Aug 04 '22 at 12:06
  • @trincot that would be ideal, but it's not the case. Fortunately the lists are capped, so it's constant in that sense. The trickier part IMO is having to re-traverse the common paths / prefixes that may have already been traversed in another path. – TheRealFakeNews Aug 05 '22 at 00:29
  • @TheRealFakeNews Hi, have you tried my solution? The caching should help with not revisiting paths; I'm just curious if it performs any better. Thanks! – thariqfahry Aug 05 '22 at 02:00
  • @thariqfahry I did take a look at it and I like the idea. Was waiting to see if anyone else would post a solution. – TheRealFakeNews Aug 10 '22 at 04:03

1 Answers1

1

You can cache previously-visited nodes in a dictionary and store their references.

# Helper function to flatten nested lists.
def flatten(xs):
    for x in xs:
        if isinstance(x, list):
            yield from flatten(x)
        else:
            yield x


# The cache stores all already-visited paths.
cache = {}


def setValue(nested_dict, path, value):
    queue = nested_dict
    depth = 0

    # Find the largest subpath that's already in the cache (=already visited),
    # starting with the full path and removing one key at a time from the end.
    for end_index in range(len(path), 0, -1):
        if path[0:end_index] in cache:
            queue = cache[path[0:end_index]]
            depth = len(path[0:end_index])
            break

    # For the rest of the keys in the path (unvisited), visit them and
    # store their references in the cache.
    while depth < len(path) - 1:
        if type(queue) is dict:
            queue = queue[path[depth]]
        elif type(queue) is list:
            queue = [i for i in flatten(queue) if path[depth] in i][0][path[depth]]
        else:
            raise TypeError(f"Unknown data type at {path}[{depth}]")

        cache[path[0 : depth + 1]] = queue
        depth += 1

    # For the final key, change its dictionary value to 'value'.
    if type(queue) is dict:
        queue[path[depth]] = value
    elif type(queue) is list:
        [i for i in flatten(queue) if path[depth] in i][0][path[depth]] = value
    else:
        raise TypeError(f"Unknown data type at {path}[{depth}]")


data = {
    'a': {
        'z': [
            {
                'x': 1,
                'y': [
                    [{'a': 10}, {'f': 21}],
                    {'m': 100},
                ]
            }
        ]
    },
    'm': {
        'x': 100
    }
}

setValue(data, ("a", "z", "x"), 900)
setValue(data, ("a", "z", "y", "a"), 901)
setValue(data, ("a", "z", "y", "f"), 902)
setValue(data, ("a", "z", "y", "m"), 903)
setValue(data, ("m", "x"), 904)

print(data)

The nested lists make traversal slightly tricky, but they can just be flattened. Looking up an item by value in a list will obviously be slower than looking up a key in a dictionary because you lose the hash table, but that's mitigated in this case by only having to do it once per node.

thariqfahry
  • 409
  • 3
  • 8