Python find if two dictionaries have the same partitioning of keys by value

Question

What is the most Pythonic way of doing the following: Suppose I have 2 dictionaries A and B. Now the regular python equality for dictionaries will check that the value and key is the same in each dictionary, and if this holds for every element of the dictionary they are equal. I want to modify this to consider a dictionary equal if for all sets of keys having the same value in A, each element in that set will have the same value in B, but not necessarily the same one as in A.

Example:

A = {'A':1, 'B':4, 'C':1}
B = {'A':9, 'B':2, 'C':9}

Here A == B. Essentially this dictionary represents a set of sets, and I want to implement set equality over it.

My attempt

def eq(a,b):
    if not a.keys() == b.keys():
        return False
    for grouping in ({k for k in a.keys() if a[k] == v} for v in a.values()):
        if not len(set(b[x] for x in grouping)) == 1:
            return False
    return True

I don't really like this approach because it doesn't short circuit since the entire generator must be consumed to convert it to a set. The idea is to partition the first set into groups such that for every group every element in it will have the same value. Then I want to make sure that for each grouping the values of the elements of the grouping are the same in the other set.

Edit I'm sorry I couldn't explain it more clearly, I will give more examples. An easier way to think about it is this: I can convert any dictionary into a set of sets as follows:

A = {'A':3, 'B':3, 'C':3, 'R':4, 'T':4}
A = {{'A', 'B', 'C'}, {'R', 'T'}}
B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}
B = {{'A', 'B', 'C'}, {'R', 'T'}}
A == B

I think you should reverse the dictionary with a value to list mapping. — pault, Jun 26 '19 at 02:40
he wants a function that finds key with same values in `A`, and then checks if the corresponding values in `B` are the same. if in `A` we have `'A'` and `'C'` with the same value, then if they (keys `'A'` and `'C'`) have the same values in 'B', return true. — בנימין כהן, Jun 26 '19 at 02:42
@pault I think this will cause pairs of {key: value} to be removed from the reversed dict. — בנימין כהן, Jun 26 '19 at 02:44
what kind of values will the input dictionaries have, will they be integers, lists, lists of lists etc? — Devesh Kumar Singh, Jun 26 '19 at 16:58

U13-Forward · Answer 1 · 2019-06-26T03:04:10.637

Some changes, I could only get to:

def eq(a,b):
    if not a.keys() == b.keys():
        return False
    for x, y in zip(a.values(), b.values()):
        if not sorted([key for key in a.keys() if a[key] == x]) == sorted([key for key in b.keys() if b[key] == y]):
            return False
    return True

But a little cleaner, would be:

def eq(a,b):
    d1 = {}
    d2 = {}
    for (x, y), (i, j) in zip(a.items(), b.items()):
        d1.setdefault(y, []).append(x)
        d2.setdefault(j, []).append(i)
    return [sorted(i) for i in d1.values()] == [sorted(i) for i in d2.values()]

Or shorter:

def eq(a,b):
    d1 = {y: sorted([i for i in a.keys() if a[i] == y]) for x, y in a.items()}
    d2 = {y: sorted([i for i in b.keys() if b[i] == y]) for x, y in b.items()}
    return list(d1.values()) == list(d2.values())

I don't think `a.keys() == b.keys()` will work since the keys may not be in the same order. — pault, Jun 26 '19 at 14:44

Devesh Kumar Singh · Answer 2 · 2019-06-26T15:30:08.823

One approach based on @pault suggestion is to make a dictionary of values to keys, and then see if the values of the two dictionaries group together in the same fashion.

I am also sorting the values of the reversed dictionary to take care of order, as well the final list of values when comparing them

from collections import defaultdict

def eq(A, B):

    rev_A = defaultdict(list)
    rev_B = defaultdict(list)

    #Create the reverse dictionary
    for k, v in A.items():
        #If v is a list, convert it to tuple to make a hashable key
        if isinstance(v, list):
            rev_A[tuple(v)].append(k)
        else:
            rev_A[v].append(k)

    for k, v in B.items():
        if isinstance(v, list):
            rev_B[tuple(v)].append(k)
        else:
            rev_B[v].append(k)

    #Sort the values of reverse dictionary
    for k, v in rev_A.items():
        rev_A[k] = sorted(v)

    for k, v in rev_B.items():
        rev_B[k] = sorted(v)

    #See if the values of both dictionaries group in same fashion
    return list(sorted(rev_A.values())) == list(sorted(rev_B.values()))

A = {'A':1, 'B':4, 'C':1}
B = {'A':9, 'B':2, 'C':9}

print(eq(A,B))

A = {'A':3, 'B':3, 'C':3, 'R':4, 'T':4}
B = {'C':8, 'R':6, 'T':6, 'A':8, 'B':8}

print(eq(A,B))

A = {'A':3, 'B':3, 'C':3, 'R':4, 'T':4}

B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}
print(eq(A,B))

The output will be

True
True
True

That doesn't work in all cases- for example, it breaks for `A = {'A':3, 'B':[[]], 'C':3, 'R':4, 'T':4}; B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}` — pault, Jun 26 '19 at 16:15
yeah, this is a bit tricky to solve, to always ensure that dictionaries are hashable in the approach I have, for now I will leave it as is — Devesh Kumar Singh, Jun 26 '19 at 16:17
@pault I guess one can increase the likelihood a dataset is hashable by json encoding it, but at some point I start to feel like trying to show uniqueness for a bunch of unhashable objects is a losing proposition. — Imperishable Night, Jun 26 '19 at 16:19

Imperishable Night · Answer 3 · 2019-06-26T18:07:26.190

Edit: Fixed the problem pointed by @pault. Although that specific input now throws an error due to values in b not being hashable...

Since OP mentioned that their original approach does not short circuit, I will try to give a method that does. This approach does require the values in a and b to be hashable.

I haven't profiled this, though. It probably depends on the nature of the inputs anyway. Specifically, if values in a or b can be hashed, but it's very inefficient, then of course this approach would suffer.

Another thought: If the two dicts are either equal (under this definition) or close to, then this implementation will need to compare all elements in a python loop, which would probably be slower than the other implementations. However, if they are possibly wildly different, allowing the short-circuit to do its work, then this approach might show an advantage.

Edit: Added a parameter encoding to forcibly hash some objects. Of course that would have some side effects depending on the encoding used, like [] and () being considered equal, and equal dicts with different order being considered unequal.

def eq(a, b, encoding = None):
    if len(a) != len(b): return False
    mapping = {}
    value_set = set()
    for k, v_a in a.items():
        v_b = b.get(k)
        if v_b is None: return False
        if encoding: v_a, v_b = encoding(v_a), encoding(v_b)
        if v_a in mapping:
            if mapping[v_a] != v_b: return False
        elif v_b in value_set: return False
        else:
            mapping[v_a] = v_b
            value_set.add(v_b)
    return True

Usage:

import json
A = {'A':3, 'B':3, 'C':3, 'R':4, 'T':4}
B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}
print(eq(A, B, encoding = json.dumps))

This doesn't work if `A = {'A':3, 'B':2, 'C':3, 'R':4, 'T':4}, B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}` — pault, Jun 26 '19 at 15:26
@pault Oh, you're right, I forgot to verify that it's a bijection too. Feels like it would be hard to make efficient if values in either dictionary isn't hashable. — Imperishable Night, Jun 26 '19 at 15:59

pault · Answer 4 · 2019-06-26T17:55:31.320

1

The other answers break if the values are not hashable. Another approach is to group keys based on the values, and check if the groups are equal for both dictionaries.

One way to do this is to use itertools.groupby to group the keys, but that will require the items to be sorted first. However, python 3 does not support sorting a heterogeneous list, so we will have to use one of the answers from How can I get 2.x-like sorting behaviour in Python 3.x?.

I picked @Fred's answer since we don't care about the sort order and it is the simplest to code.

from itertools import groupby
from operator import itemgetter
from numbers import Real
from decimal import Decimal

# from https://stackoverflow.com/a/26663384/5858851
def motley(value):
    numeric = Real, Decimal
    if isinstance(value, numeric):
        typeinfo = numeric
    else:
        typeinfo = type(value)

    try:
        x = value < value
    except TypeError:
        value = repr(value)

    return repr(typeinfo), value

def eq(A, B):
    def get_key_groups(X):
        return set(
            tuple(map(itemgetter(0), g)) 
            for i, g in groupby(
                sorted(X.items(), key=lambda x: motley(x[1])), 
                key=itemgetter(1)
            )
        )
    return get_key_groups(A) == get_key_groups(B)

Some tests:

A = {'A':1, 'B':4, 'C':1}
B = {'A':9, 'B':2, 'C':9}
eq(A, B)
#True

A = {'A':3, 'B':3, 'C':3, 'R':4, 'T':4}
B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}
eq(A, B)
#True

A = {'A':3, 'B':2, 'C':3, 'R':4, 'T':4}
B = {'A':[], 'B':[], 'C':[], 'R':"", 'T':""}
eq(A, B)
#False

edited Jun 26 '19 at 17:55

answered Jun 26 '19 at 14:34

pault

41,343
15
107
149

When I tried your test I get an error (`'<' not supported between instances of 'str' and 'list'`; half expected). Is it a python version issue? – Imperishable Night Jun 26 '19 at 16:27
`Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) [Clang 6.0 (clang-600.0.57)] on darwin`. – Imperishable Night Jun 26 '19 at 16:32
After thinking about it, I think this is a good change. Python might be a dynamically typed language, but that doesn't mean you should be able to compare uncorrelated types without thinking about why you want to do it. And in practical scenarios, if you can think of a reason to do it, you can probably find a suitable encoding that makes your program's behavior more well-defined. – Imperishable Night Jun 26 '19 at 16:54
1

@ImperishableNight found the answer: https://docs.python.org/3.0/whatsnew/3.0.html#ordering-comparisons – pault Jun 26 '19 at 16:56
Yes, I expected that to be one of the Python 3 changes. More surprised that they didn't provide an "official workaround" for it, but that's probably because it's really use case dependent. – Imperishable Night Jun 26 '19 at 17:01
@ImperishableNight I fixed the solution based on https://stackoverflow.com/questions/26575183/how-can-i-get-2-x-like-sorting-behaviour-in-python-3-x – pault Jun 26 '19 at 17:49

Python find if two dictionaries have the same partitioning of keys by value

4 Answers4