3
a_standard = {
    'section1': {
        'category1': 1,
        'category2': 2
    },
    'section2': {
        'category1': 1,
        'category2': 2
    }

}

a_new = {
    'section1': {
        'category1': 1,
        'category2': 2
    },
    'section2': {
        'category1': 1,
        'category2': 3
    }

}

I want to find the difference between a_standard and a_new which is in a_new[section2][category2] difference in value being 2 and 3

Should I convert each to a set and then do difference or loop and compare the dict?

Steve
  • 101
  • 1
  • 7
  • 4
    Possible duplicate of [Comparing Python dictionaries and nested dictionaries](https://stackoverflow.com/questions/27265939/comparing-python-dictionaries-and-nested-dictionaries) – usernamenotfound Feb 06 '18 at 22:11

3 Answers3

4

There is a library called deepdiff that has a lot of options, but I find it to be somewhat unintuitive.

Here's a recursive function that I often use to compute diffs during my unit tests. This goes a bit beyond what the question asks, because I take care of the case of lists being nested as well. I hope you'll find it useful.

Function definition:

from copy import deepcopy


def deep_diff(x, y, parent_key=None, exclude_keys=[], epsilon_keys=[]):
    """
    Take the deep diff of JSON-like dictionaries

    No warranties when keys, or values are None

    """
    EPSILON = 0.5
    rho = 1 - EPSILON

    if x == y:
        return None

    if parent_key in epsilon_keys:
        xfl, yfl = float_or_None(x), float_or_None(y)
        if xfl and yfl and xfl * yfl >= 0 and rho * xfl <= yfl and rho * yfl <= xfl:
            return None

    if type(x) != type(y) or type(x) not in [list, dict]:
        return x, y

    if type(x) == dict:
        d = {}
        for k in x.keys() ^ y.keys():
            if k in exclude_keys:
                continue
            if k in x:
                d[k] = (deepcopy(x[k]), None)
            else:
                d[k] = (None, deepcopy(y[k]))

        for k in x.keys() & y.keys():
            if k in exclude_keys:
                continue

            next_d = deep_diff(x[k], y[k], parent_key=k, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys)
            if next_d is None:
                continue

            d[k] = next_d

        return d if d else None

    # assume a list:
    d = [None] * max(len(x), len(y))
    flipped = False
    if len(x) > len(y):
        flipped = True
        x, y = y, x

    for i, x_val in enumerate(x):
        d[i] = deep_diff(y[i], x_val, parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys) if flipped else deep_diff(x_val, y[i], parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys)

    for i in range(len(x), len(y)):
        d[i] = (y[i], None) if flipped else (None, y[i])

    return None if all(map(lambda x: x is None, d)) else d

# We need this helper function as well:
def float_or_None(x):
    try:
        return float(x)
    except ValueError:
        return None

Usage:

>>> deep_diff(a_standard, a_new)

{'section2': {'category2': (2, 3)}}

I think the output is a little more intuitive than the other answers.

In unit tests I'll so something like:

import json

diff = deep_diff(expected_out, out, exclude_keys=["flickery1", "flickery2"])
assert diff is None, json.dumps(diff, indent=2)
  • Thanks for this snippet, it was the format I needed! I do have one issue with it though that I have no idea how to fix, when comparing lists it seems to add None to the end of the list if the elements are dicts. Hope that makes sense! Thanks again! – geekscrap Dec 07 '20 at 20:01
  • Actually, I say that, it seems to add a list of dictionaries at certain points. – geekscrap Dec 07 '20 at 20:28
  • There were some bugs in that code. I am copying a better version in now. – Zephaniah Grunschlag Dec 09 '20 at 00:45
  • I hope that fixed your problem, but if it didn't, I'd appreciate an example. – Zephaniah Grunschlag Dec 09 '20 at 00:49
  • Hi Zephaniah! Just managed to get round to making an example of the issue I'm facing! At the bottom of the script, you will see the results (so you don't have to run it). I appreciate the help so far, but I just can't follow your code, recursive funcs aren't my thing... https://gist.github.com/geekscrapy/1e2d4409cb2c2c74ec85592b1d3e70d4#file-gistfile1-txt – geekscrap Dec 26 '20 at 18:08
  • Hi geekscrap. Thanks for the great example. I responded there with more details but the upshot is that it's working as I expect it to, but possibly not the way that you would like. If you notice the code has a comment `No warranties when keys, or values are None`. This is hinting at the fact that you can't really distinguish between the case that a value is `None` and it the key is actually missing. I didn't see an elegant work around but I'm open to suggestions. – Zephaniah Grunschlag Dec 27 '20 at 02:49
  • FYI - Just wanted to follow up and say that the issue was [resolved off thread](https://gist.github.com/geekscrapy/1e2d4409cb2c2c74ec85592b1d3e70d4#file-gistfile1-txt). So I still stand by my solution above. – Zephaniah Grunschlag Mar 13 '21 at 14:44
3

You can use recursion:

a_standard = {
'section1': {
    'category1': 1,
    'category2': 2
},
'section2': {
    'category1': 1,
    'category2': 2
 }

}

a_new = {
'section1': {
    'category1': 1,
    'category2': 2
},
'section2': {
    'category1': 1,
    'category2': 3
 }

}
def differences(a, b, section=None):
    return [(c, d, g, section) if all(not isinstance(i, dict) for i in [d, g]) and d != g else None if all(not isinstance(i, dict) for i in [d, g]) and d == g else differences(d, g, c) for [c, d], [h, g] in zip(a.items(), b.items())]

n = filter(None, [i for b in differences(a_standard, a_new) for i in b])

Output:

[('category2', 2, 3, 'section2')]

Which yields the key corresponding to the unequal values.

Edit: without list comprehension:

def differences(a, b, section = None):
  for [c, d], [h, g] in zip(a.items(), b.items()):
      if not isinstance(d, dict) and not isinstance(g, dict):
         if d != g:
            yield (c, d, g, section)
      else:
          for i in differences(d, g, c):
             for b in i:
               yield b
print(list(differences(a_standard, a_new)))

Output:

['category2', 2, 3, 'section2']

This solution utilizes generators (hence the yield statement), which store the yielded values on the fly, only remembering where it left off. The values can be garnered by casting the returned result as a list. yield makes it easier to accumulate the value differences and removes the need to keep an additional parameter in the function or a global variable.

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
2

you can do this assuming the keys are the same:

def find_diff(dict1, dict2):
    differences = []
    for key in dict1.keys(): 
        if type(dict1[key]) is dict:
            return find_diff(dict1[key], dict2[key])
        else:
            if not dict1[key] == dict2[key]:
                differences.append((key, dict1[key], dict2[key]))
    return differences

I’m typing on my phone right now, so sorry if the syntax is a little messed up.

sist
  • 320
  • 1
  • 2
  • 15
CrizR
  • 688
  • 1
  • 6
  • 26