0

I wrote a function to find the different keys in two nested dictionaries. I was heavily inspired by this answer.

    def find_diff_keys(d1: dict, d2: dict, not_included_keys:list = [], path=""):
    for k in d1:
        if k in d2:
            if type(d1[k]) is dict:
                find_diff_keys(d1[k],d2[k],not_included_keys, "%s.%s" % (path, k) if path else k)
        else:
            if type(d1[k]) is dict:
                not_included_keys.append("%s" % path if path else k) # For Root Valuess
                for sub_k in d1[k]:
                    not_included_keys.append("%s.%s" % (k, sub_k))
                continue
            not_included_keys.append("%s.%s" % ("%s" % path if path else "", k))
    return not_included_keys

Now lets assume we have two nested dictionaries : a and b. If I wanted to get the diff of these two dictionaries I would run the following. Note that I did not specify an empty list as the not_included_keys parameter...

print("The following keys are included in A but not in B:")
diff_a_to_b = find_diff_keys(ini_dict_a, ini_dict_b)
pprint(diff_a_to_b)
print("")
print("The following keys are included in B but not in A:")
diff_b_to_a = find_diff_keys(ini_dict_b, ini_dict_a)
pprint(diff_b_to_a)

The output of this would be :

The following keys are included in A but not in B:
['FAIL.WEAREDOINGSOMETHING',
 'FAIL.ehh',
 'FAIL.dsa',
 'FAIL.fds',
 'FAIL.gd',
 'FAIL.ewq',
 'TESTVALUE',
 'TESTVALUE.Name',
 'TESTSUBSUB.TestValues2',
 'TESTSUBSUB.TestValues2ltaHz']

The following keys are included in B but not in A:
['FAIL.WEAREDOINGSOMETHING',
 'FAIL.ehh',
 'FAIL.dsa',
 'FAIL.fds',
 'FAIL.gd',
 'FAIL.ewq',
 'TESTVALUE',
 'TESTVALUE.Name',
 'TESTSUBSUB.TestValues2',
 'TESTSUBSUB.TestValues2ltaHz']

Now if I run the function with the empty list as parameter :

diff_a_to_b = find_diff_keys(ini_dict_a, ini_dict_b, [])
diff_b_to_a = find_diff_keys(ini_dict_b, ini_dict_a, [])

The output will be :

The following keys are included in A but not in B:
['FAIL.WEAREDOINGSOMETHING',
 'FAIL.ehh',
 'FAIL.dsa',
 'FAIL.fds',
 'FAIL.gd',
 'FAIL.ewq',
 'TESTVALUE',
 'TESTVALUE.Name',
 'TESTSUBSUB.TestValues2',
 'TESTSUBSUB.TestValues2ltaHz',
 'TESTSUBSUB.TestValues2eltaHz']

The following keys are included in B but not in A:
['PointsConfig.130324',
 'ResetAlarm.NotEnabledValue',
 'ResetAlarm.SomeValue',
 'PointsConfig.13032402.Mains.Some_small_difference']

Can someone understand why in the first approach python just copies the old list whilst in the second approach the function does what it is supposed to ?

canTuerk
  • 3
  • 2

1 Answers1

0

Don't use mutable objects as default arguments.(lists, dicts, etc)

Do this instead.

def append_to(element, to=None):
    if to is None:
        to = []
    to.append(element)
    return to

Why? Because a list or dict for example is mutable it means you're editing the same reference throughout the code. And python creates the list when the function is defined, not when it's called, because functions are first class citizens in python.

For more info: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
uzumaki
  • 124
  • 1
  • 4
  • The alternative approach is to continue using the default `to=[]` and just put `to = list(to)` as the first line (so you never modify an OP's `list` by side-effect, and they can pass non-`list` iterables that are always coerced to a `list` so you can use them as such). Receiving a default `[]` is always a little bit of code smell, but if you *immediately* copy it like that, the maintainers won't have to look far to confirm it's being used safely. – ShadowRanger May 11 '22 at 13:46
  • yeah, that works too. How that looks reference wise OP (it's a shallow copy): https://pythontutor.com/visualize.html#code=test%20%3D%20%5B%5B2%5D%5D%0A%0Atest2%20%3D%20list%28test%29%0A&cumulative=false&curInstr=2&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false – uzumaki May 11 '22 at 13:50