0

I have 2 dictionaries in python (d1, d2) where I need to pass the missing "id" item from d2 to d1, ignoring any other differences (such as the extra "child" in d1). What effectively is needed, is that a result dictionary is just d1 with "id" items added. I have tried merging, but it did not work since either way I lose data.

d1 = {
    "parent": {
        "name": "Axl",
        "surname": "Doe",
        "children": [
            {
                "name": "John",
                "surname": "Doe"
            },                
            {
                "name": "Jane",
                "surname": "Doe",
                "children": [
                    {
                        "name": "Jim",
                        "surname": "Doe"
                    },
                    {
                        "name": "Kim",
                        "surname": "Doe"
                    }
                ]
            }
        ]
    }
}

d2 = {
    "parent": {
        "id": 1,
        "name": "Axl",
        "surname": "Doe",
        "children": [
            {
                "id": 2,
                "name": "John",
                "surname": "Doe"
            },
            {
                "id": 3,
                "name": "Jane",
                "surname": "Doe",
                "children": [
                    {
                        "id": 4,
                        "name": "Jim",
                        "surname": "Doe"
                    },
                    {
                        "id": 5,
                        "name": "Kim",
                        "surname": "Doe"
                    },
                    {
                        "id": 6
                        "name": "Bill",
                        "surname": "Doe"
                    },
                ]
            }
        ]
    }
}

result = {
"parent": {
    "id": 1,
    "name": "Axl",
    "surname": "Doe",
    "children": [
        {
            "id": 2,
            "name": "John",
            "surname": "Doe"
        },
        {
            "id": 3,
            "name": "Jane",
            "surname": "Doe",
            "children": [
                {
                    "id": 4,
                    "name": "Jim",
                    "surname": "Doe"
                },
                {
                    "id": 5,
                    "name": "Kim",
                    "surname": "Doe"
                }
            ]
        }
    ]
}

}

Any ideas?

unicorn
  • 139
  • 1
  • 9
  • Are the names unique? – Bharel Oct 06 '18 at 13:22
  • Actually no... This is an example dict, just to show the structure – unicorn Oct 06 '18 at 13:27
  • Are the names of children unique? That is, a single parent can only have 1 child of each `(name, surname)`? I'm just trying to see how can we solve this without matching the whole sequence of children and incurring `O(n^2)` cost. – Bharel Oct 06 '18 at 13:32
  • Yes that is correct. I mean that the names are not unique across different parents – unicorn Oct 06 '18 at 13:34

2 Answers2

2

I match children according to a key function, in this case "name" and "surname" attributes.

I then go over the id_lookup dict (named d2 in your example) and try to match each child with main_dict's children. If I find a match, I recurse into it.

In the end, main_dict (or d1 in your example) is filled with IDs :-)

import operator

root = main_dict["parent"]
lookup_root = id_lookup_dict["parent"]

keyfunc = operator.itemgetter("name", "surname")

def _recursive_fill_id(root, lookup_root, keyfunc):
    """Recursively fill root node with IDs

    Matches nodes according to keyfunc
    """
    root["id"] = lookup_root["id"]

    # Fetch children
    root_children = root.get("children")

    # There are no children
    if root_children is None:
        return

    children_left = len(root_children)

    # Create a dict mapping the key identifying a child to the child
    # This avoids a hefty lookup cost and requires a single iteration.
    children_dict = dict(zip(map(keyfunc, root_children), root_children))

    for lookup_child in lookup_root["children"]:
        lookup_key = keyfunc(lookup_child)
        matching_child = children_dict.get(lookup_key)

        if matching_child is not None:
            _recursive_fill_id(matching_child, lookup_child, keyfunc)

            # Short circuit in case all children were filled
            children_left -= 1
            if not children_left:
                break

_recursive_fill_id(root, lookup_root, keyfunc)
Bharel
  • 23,672
  • 5
  • 40
  • 80
  • Actually, I tried the one you deleted and it seems to work just fine! Do you think this is more efficient? The previous one is a more generic solution, to be honest. Thank you very much for this really quick workaround!! – unicorn Oct 06 '18 at 14:11
  • It's the same, just changed the names so it'll be more readable. – Bharel Oct 06 '18 at 14:12
1

I wished to add an iterative answer instead of the recursive answer, as it'll probably prove to be more efficient.

It will not reach any stack threshold and will be a bit faster:

import operator

root = main_dict["parent"]
lookup_root = id_lookup_dict["parent"]

keyfunc = operator.itemgetter("name", "surname")

def _recursive_fill_id(root, lookup_root, keyfunc):
    """Recursively fill root node with IDs

    Matches nodes according to keyfunc
    """
    matching_nodes = [(root, lookup_root)]

    while matching_nodes:
        root, lookup_root = matching_nodes.pop()
        root["id"] = lookup_root["id"]

        # Fetch children
        root_children = root.get("children")

        # There are no children
        if root_children is None:
            continue

        children_left = len(root_children)

        # Create a dict mapping the key identifying a child to the child
        # This avoids a hefty lookup cost and requires a single iteration.
        children_dict = dict(zip(map(keyfunc, root_children), root_children))

        for lookup_child in lookup_root["children"]:
            lookup_key = keyfunc(lookup_child)
            matching_child = children_dict.get(lookup_key)

            if matching_child is not None:
                matching_nodes.append((matching_child, lookup_child))

                # Short circuit in case all children were filled
                children_left -= 1
                if not children_left:
                    break


_recursive_fill_id(root, lookup_root, keyfunc)
Bharel
  • 23,672
  • 5
  • 40
  • 80