update or add new field in a nested dictionary in Python

Question

I am trying to update or to add a new key with its value in a python dictionary ( it is a nested dictionary). Basically this dictionary is coming from an API. the first call, the API responds with the initial dictionary which is like that :

status_from_db = {
    "management": {
        "declarations": {
            "activations": [
                {
                    "active": True,
                    "identifier": "a9c509ea-3e03-4877-846c-208e82ac2b04",
                    "server-token": "5a3d2ed9-b67e-5ebb-bbed-b4fcd79f699c",
                    "valid": "valid"
                }
            ],
            "assets": [
                {
                    "active": True,
                    "identifier": "664a311d-cead-400a-8659-2c27facd3c15",
                    "server-token": "9ca2ad23-bbcc-5651-bf7e-e861f00b92a5",
                    "valid": "valid"
                }
            ],
            "configurations": [
                {
                    "active": True,
                    "identifier": "6fb97864-e657-4600-a545-730d2e5a8a2d",
                    "server-token": "78b2239e-3617-55a4-9618-14564209cd56",
                    "valid": "valid"
                },
                {
                    "active": True,
                    "identifier": "819be2ec-fe0f-486b-87f6-b409e80053e2",
                    "server-token": "310ffc2f-50c6-591d-a730-f9d2954357b2",
                    "valid": "valid"
                },
                {
                    "active": True,
                    "identifier": "cbd9906c-046b-4586-bade-843aab5a385d",
                    "server-token": "b4975812-f2d7-5c8d-935b-239e888feed3",
                    "valid": "valid"
                }
            ],
            "management": []
        }
    },
    "mdm": {
        "app": [
            {
            "state": "prompting",
                        "identifier": "com.netflix.Netflix"
            },
            {
            "state": "prompting",
                        "identifier": "test"
            },
            {
            "state": "prompting",
                        "identifier": "blabla"
            }
        ]
    },
    "passcode": {
        "is-compliant": True,
        "is-present": True
    }
}

and I am trying to update based on a new python dictionary where the presence of keys is unknown. This input dict is coming also from the same API and the API only send the updated/added item. It doesnt send all of the dictionary to save some bandwidth. The values to update or add doesnt always contains the same keys. Sometimes the key are already in the base dictionary and need to be updated but sometimes they are totally new and need to be added in the base initial dictionary.

it may look like that:

status_from_device = {
    "mdm": {
        "app": [
            {
            "removed": True,
            "identifier": "test"
            },
            {
        "state": "BLABLA",
            "identifier": "blabla"
            }
        ]

    },
    "passcode": {
        "is-present": False
    }
}

or like that:

status_from_device = {
    "device": {
        "model": {
            "identifier": "my model identifier"
            }

    }
}

For exemple with the above dictionaries , i would like to update the correct field in the nested dictionary without removing the keys already present and get something like that as output.

status_from_db = {
    "management": {
        "declarations": {
            "activations": [
                {
                    "active": True,
                    "identifier": "a9c509ea-3e03-4877-846c-208e82ac2b04",
                    "server-token": "5a3d2ed9-b67e-5ebb-bbed-b4fcd79f699c",
                    "valid": "valid"
                }
            ],
            "assets": [
                {
                    "active": True,
                    "identifier": "664a311d-cead-400a-8659-2c27facd3c15",
                    "server-token": "9ca2ad23-bbcc-5651-bf7e-e861f00b92a5",
                    "valid": "valid"
                }
            ],
            "configurations": [
                {
                    "active": True,
                    "identifier": "6fb97864-e657-4600-a545-730d2e5a8a2d",
                    "server-token": "78b2239e-3617-55a4-9618-14564209cd56",
                    "valid": "valid"
                },
                {
                    "active": True,
                    "identifier": "819be2ec-fe0f-486b-87f6-b409e80053e2",
                    "server-token": "310ffc2f-50c6-591d-a730-f9d2954357b2",
                    "valid": "valid"
                },
                {
                    "active": True,
                    "identifier": "cbd9906c-046b-4586-bade-843aab5a385d",
                    "server-token": "b4975812-f2d7-5c8d-935b-239e888feed3",
                    "valid": "valid"
                }
            ],
            "management": []
        }
    },
    "mdm": {
        "app": [
            {
         "state": "prompting",
            "identifier": "com.netflix.Netflix"
            },
            {
        "state": "prompting",
            "removed": True,
            "identifier": "test"
            },
            {
        "state": "BLABLA",
            "identifier": "blabla"
            }
        ]
    },
    "passcode": {
        "is-compliant": True,
        "is-present": False
    },
    "device": {
        "model": {
           "identifier": "my model identifier"
         }
     }
}

So as you can see the key "removed" was added in the correct item and the key "state" was also updated. and we also append the new <key=device, value={"model": {"identifier": "my model identifier"}> I know we have to look through all of the initial dict recursively and find the place to do the update.

I am kinda lost of the correct way to write my recursive function to find the correct item to update/add base on the inputs.

I have tried several way to write a recursive function without success. i used " set" to find which key need to be added and which need to be updated based on the base dictionary A (from the first sent of the API) and on the input dict B ( the API sends it when somethings change on its side) every key in B but not in A are the new ones every key in B and A are the one with some udpate available. we dont need to handle the deletion of items from A.

If someone has an idea on how to solve the problem, it would save my day.

Thanks a lot.

Perhaps this can help: [How to update values in a nested dictionary?](https://stackoverflow.com/questions/73775750/how-to-update-values-in-a-nested-dictionary/73775916#73775916) then you can write `new_status = updated_in_depth(status_from_db, status_from_device)` — Stef, Jan 21 '23 at 08:47
Oops, no, sorry, that question treats the replacing dict differently than you. It can still be an inspiration but the code needs to be adapted. — Stef, Jan 21 '23 at 08:53
This question is perhaps more helping: [How to merge dictionaries of dictionaries?](https://stackoverflow.com/questions/7204805/how-to-merge-dictionaries-of-dictionaries) — Stef, Jan 21 '23 at 08:57

Stef · Accepted Answer · 2023-01-21T09:28:26.700

You need to walk recursively the two dicts simultaneously. You can use isinstance to determine if an object is a dict, a list or a single item, and explore that object recursively accordingly.

Below, I wrote a function updated_in_depth that returns a new, updated dictionary, without modifying any of the two dictionaries. However, it doesn't make deepcopies, so modifying the new or old dictionary may modify the other, too. If that doesn't suit you, you can replace a[k] by deepcopy(a[k]) in the code below, with from copy import deepcopy at the beginning of the code.

I used set operations to distinguish between the three types of keys in the two dictionaries:

keys_a, keys_b = set(a.keys()), set(b.keys())
keys_a, keys_b, keys_ab = keys_a - keys_b, keys_b - keys_a, keys_a & keys_b

Here - is set.difference, and & is set.intersection.

Now keys_a contains the keys that are unique to a; keys_b contains the keys that are unique to b; and keys_ab contains the keys that are present in both dicts.

from itertools import chain

def merged_in_depth(a, b):
    if isinstance(a, dict) and isinstance(b, dict):
        keys_a, keys_b = set(a.keys()), set(b.keys())
        keys_a, keys_b, keys_ab = keys_a - keys_b, keys_b - keys_a, keys_a & keys_b 
        return dict(chain(
            ((k, a[k]) for k in keys_a),
            ((k, b[k]) for k in keys_b),
            ((k, merged_in_depth(a[k], b[k])) for k in keys_ab)
        ))
    elif isinstance(a, list) and isinstance(b, list):
        da = {x['identifier']: x for x in a}
        db = {y['identifier']: y for y in b}
        keys_a, keys_b = set(da.keys()), set(db.keys())
        keys_a, keys_b, keys_ab = keys_a - keys_b, keys_b - keys_a, keys_a & keys_b 
        return list(chain(
            (da[k] for k in keys_a),
            (db[k] for k in keys_b),
            (merged_in_depth(da[k], db[k]) for k in keys_ab)
        ))
    elif isinstance(a,dict) or isinstance(a,list) or isinstance(b,dict) or isinstance(b,list):
        raise ValueError('The two dicts have different structures!')
    else:
        return b

Testing:

status_from_db = {'management': {'declarations': {'activations': [{'active': True, 'identifier': 'a9c509ea-3e03-4877-846c-208e82ac2b04', 'server-token': '5a3d2ed9-b67e-5ebb-bbed-b4fcd79f699c', 'valid': 'valid'}], 'assets': [{'active': True, 'identifier': '664a311d-cead-400a-8659-2c27facd3c15', 'server-token': '9ca2ad23-bbcc-5651-bf7e-e861f00b92a5', 'valid': 'valid'}], 'configurations': [{'active': True, 'identifier': '6fb97864-e657-4600-a545-730d2e5a8a2d', 'server-token': '78b2239e-3617-55a4-9618-14564209cd56', 'valid': 'valid'}, {'active': True, 'identifier': '819be2ec-fe0f-486b-87f6-b409e80053e2', 'server-token': '310ffc2f-50c6-591d-a730-f9d2954357b2', 'valid': 'valid'}, {'active': True, 'identifier': 'cbd9906c-046b-4586-bade-843aab5a385d', 'server-token': 'b4975812-f2d7-5c8d-935b-239e888feed3', 'valid': 'valid'}], 'management': []}}, 'mdm': {'app': [{'state': 'prompting', 'identifier': 'com.netflix.Netflix'}, {'state': 'prompting', 'identifier': 'test'}, {'state': 'prompting', 'identifier': 'blabla'}]}, 'passcode': {'is-compliant': True, 'is-present': True}}

dev1 = {'mdm': {'app': [{'removed': True, 'identifier': 'test'}, {'state': 'BLABLA', 'identifier': 'blabla'}]}, 'passcode': {'is-present': False}}

dev2 = {'device': {'model': {'identifier': 'my model identifier'}}}

print( merged_in_depth(status_from_db, dev1) )
# {'management': {'declarations': {'activations': [{'active': True, 'identifier': 'a9c509ea-3e03-4877-846c-208e82ac2b04', 'server-token': '5a3d2ed9-b67e-5ebb-bbed-b4fcd79f699c', 'valid': 'valid'}], 'assets': [{'active': True, 'identifier': '664a311d-cead-400a-8659-2c27facd3c15', 'server-token': '9ca2ad23-bbcc-5651-bf7e-e861f00b92a5', 'valid': 'valid'}], 'configurations': [{'active': True, 'identifier': '6fb97864-e657-4600-a545-730d2e5a8a2d', 'server-token': '78b2239e-3617-55a4-9618-14564209cd56', 'valid': 'valid'}, {'active': True, 'identifier': '819be2ec-fe0f-486b-87f6-b409e80053e2', 'server-token': '310ffc2f-50c6-591d-a730-f9d2954357b2', 'valid': 'valid'}, {'active': True, 'identifier': 'cbd9906c-046b-4586-bade-843aab5a385d', 'server-token': 'b4975812-f2d7-5c8d-935b-239e888feed3', 'valid': 'valid'}], 'management': []}},
#  'passcode': {'is-compliant': True, 'is-present': False},
#  'mdm': {'app': [{'state': 'prompting', 'identifier': 'com.netflix.Netflix'}, {'state': 'prompting', 'removed': True, 'identifier': 'test'}, {'state': 'BLABLA', 'identifier': 'blabla'}]}}

print( merged_in_depth(status_from_db, dev2) )
# {'passcode': {'is-compliant': True, 'is-present': True},
#  'mdm': {'app': [{'state': 'prompting', 'identifier': 'com.netflix.Netflix'}, {'state': 'prompting', 'identifier': 'test'}, {'state': 'prompting', 'identifier': 'blabla'}]},
#  'management': {'declarations': {'activations': [{'active': True, 'identifier': 'a9c509ea-3e03-4877-846c-208e82ac2b04', 'server-token': '5a3d2ed9-b67e-5ebb-bbed-b4fcd79f699c', 'valid': 'valid'}], 'assets': [{'active': True, 'identifier': '664a311d-cead-400a-8659-2c27facd3c15', 'server-token': '9ca2ad23-bbcc-5651-bf7e-e861f00b92a5', 'valid': 'valid'}], 'configurations': [{'active': True, 'identifier': '6fb97864-e657-4600-a545-730d2e5a8a2d', 'server-token': '78b2239e-3617-55a4-9618-14564209cd56', 'valid': 'valid'}, {'active': True, 'identifier': '819be2ec-fe0f-486b-87f6-b409e80053e2', 'server-token': '310ffc2f-50c6-591d-a730-f9d2954357b2', 'valid': 'valid'}, {'active': True, 'identifier': 'cbd9906c-046b-4586-bade-843aab5a385d', 'server-token': 'b4975812-f2d7-5c8d-935b-239e888feed3', 'valid': 'valid'}], 'management': []}}, 'device': {'model': {'identifier': 'my model identifier'}}}

Thank you. Indeed your function worked in most cases but what happen if the input is `dev3 = {"management": { "declarations": { "activations": [],"assets": [], "configurations": [],"management": [] } }} ` It looks u assume the list has always a key["identifier"] present but I can't guarantee what the API will reply... — jimjim32, Jan 24 '23 at 07:11
@jimjim32 Yes, I assumed the dicts in the lists always have identifiers. I have no idea how you want to handle a list without those identifiers. I'm not familiar with your database or API, all I know is what you wrote in your post, and in that post it looks like the only way to find which elements in `status_from_device` correspond to which elements in `status_from_db` is by comparing the `'identifier'` fields. — Stef, Jan 24 '23 at 08:56
@jimjim32 You ask *"Do you have any suggestion on how I can modify your function to make it work in that scenario?"*. What does "make it work" mean? What result would you want in that scenario? — Stef, Jan 24 '23 at 08:57
Also, you say *"the code will throw an error."* but I just tried with this `dev3` and my code does not throw an error. — Stef, Jan 24 '23 at 08:59
This would result in an error: `dev4 = {"management": { "declarations": { "activations": [],"assets": [], "configurations": [{'active': False, 'no "identifier" field in this dict': 'what to do?'}],"management": [] } }}`. But I don't know what result you want in that case. — Stef, Jan 24 '23 at 09:02
Hi, Oh yes sorry u are right when everything is empty, it doesnt triggered the error. The input i used to throw the error was similar to what you mentioned in your comment `dev4 = {"management": { "declarations": { "activations": [],"assets": [], "configurations": [{'active': False}],"management": [] } }}` I will accept your first reply as solution. I guess for my use cases, for now it will work and as I dont have any control on what the API might send if in the future it doesn't work anymore. I will adapt to it. Thank you so much for your help. — jimjim32, Jan 25 '23 at 02:11
@jimjim32 You can replace `{x['identifier']: x for x in a}` with `{x['identifier']: x for x in a if 'identifier' in x}` if you want to silently ignore dicts from list `a` that don't contain an `'identifier'` field. And likewise for `b`. But note that in many cases it's better to have a function that fails on weird inputs, than a function that silently "works approximately" without telling you something is wrong. — Stef, Jan 25 '23 at 09:05

update or add new field in a nested dictionary in Python

1 Answers1