2

I'd like to check whether a dictionary is a subset of another dictionary recursively. Let's assume both dictionaries having builtin types as items.

I've seen there is already a very old thread Python: Check if one dictionary is a subset of another larger dictionary trying to solve something similar but not quite... Because none of the answers living there will serve for my purposes I've decided to post my own solution in there but it is still not fully complete, the below function is working ok for almost all cases but it fails miserably in cases when the subset has values that don't exist in the superset, ie:

def is_subset(superset, subset):
    for key, value in superset.items():
        if key not in subset:
            continue

        if isinstance(value, dict):
            if not is_subset(value, subset[key]):
                return False
        elif isinstance(value, str):
            if not subset[key] in value:
                return False
        elif isinstance(value, list):
            if not set(subset[key]) <= set(value):
                return False
        elif isinstance(value, set):
            if not subset[key] <= value:
                return False
        else:
            if not subset[key] == value:
                return False

    return True

superset = {'question': 'mcve', 'metadata': {}}
subset = {'question': 'mcve', 'metadata': {'author': 'BPL'}}

print(is_subset(superset, subset))

The function returns True but it should return False. So, how would you fix it?

BPL
  • 9,632
  • 9
  • 59
  • 117

4 Answers4

3

Your code logic is upside down. Notice how you take each element in the superset and continue if they are not in the subset. What you want to do is take each element in the subset and check that they are in the superset.

Here is a fixed version of you code.

def is_subset(superset, subset):
    for key, value in subset.items():
        if key not in superset:
            return False

        if isinstance(value, dict):
            if not is_subset(superset[key], value):
                return False

        elif isinstance(value, str):
            if value not in superset[key]:
                return False

        elif isinstance(value, list):
            if not set(value) <= set(superset[key]):
                return False
        elif isinstance(value, set):
            if not value <= superset[key]:
                return False

        else:
            if not value == superset[key]:
                return False

    return True

Here are some example of the function giving the correct result.

superset = {'question': 'mcve', 'metadata': {}}
subset = {'question': 'mcve', 'metadata': {'author': 'BPL'}}

is_subset(superset, subset) # False

superset = {'question': 'mcve', 'metadata': {'foo': {'bar': 'baz'}}}
subset = {'metadata': {'foo': {}}}

is_subset(superset, subset) # True

superset = {'question': 'mcve', 'metadata': {'foo': 'bar'}}
subset = {'question': 'mcve', 'metadata': {}, 'baz': 'spam'}

is_subset(superset, subset) # False
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
  • I just find it weird that you chose to consider `{'foo': 'b'}` a subset of `{'foo': 'bar'}`. Other than that, you might want more complete behaviours for dictionaries nested in sets and lists, but this might also be out of scope for your use cases. – Olivier Melançon Mar 22 '18 at 03:18
  • It's not weird as long as you are aware of that behaviour. If it fits your usecase, then it's perfectly fine. – Olivier Melançon Mar 22 '18 at 03:25
1

Here is a solution that also properly recurses into lists and sets. (i changed the order of arguments, because that made more sense to me)

def is_subset(subset, superset):
    if isinstance(subset, dict):
        return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
    
    if isinstance(subset, list) or isinstance(subset, set):
        return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)

    if isinstance(subset, str):
        return subset in superset

    # assume that subset is a plain value if none of the above match
    return subset == superset

When using python 3.10, you can use python's new match statements to do the typechecking:

def is_subset(subset, superset):
    match subset:
        case dict(_): return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
        case list(_) | set(_): return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)
        case str(_): return subset in superset
        # assume that subset is a plain value if none of the above match
        case _: return subset == superset
Frederik Baetens
  • 781
  • 1
  • 9
  • 20
0

Just a guess, but I think because the dic returned by 'metadata' in the superset is empty, none of the if statements return true so you get the final return of True.

You could just do a check to see if the length of either dic is zero. If one is an not the other, then return false. Otherwise proceed with your recursive solution.

William
  • 83
  • 8
0

I didn't like the original solution much - it didn't work for some of the cases I had, as some of the comments said. Here's a more generalized solution:

def is_subvalue(supervalue, subvalue) -> bool:
    """Meant for comparing dictionaries, mainly.

    Note - I don't treat ['a'] as a subvalue of ['a', 'b'], or 'a' as a subvalue of 'ab'.
    For that behavior for a list or set, remove the line: `if len(supervalue) != len(subvalue): return False`
    For that behavior for a string, switch `subvalue == supervalue` to `subvalue in supervalue` for strings only.
    But NOT in this function, as it's meant to compare dictionaries and {'ab': 'a'} is not the same as {'a': 'a'}
    """
    if isinstance(subvalue, list) or isinstance(subvalue, set):
        if isinstance(subvalue, list) and not isinstance(supervalue, list):
            return False
        if isinstance(subvalue, set) and not isinstance(supervalue, set):
            return False
        if len(supervalue) != len(subvalue):
            return False
        return all([is_subvalue(supervalue[i], subvalue[i]) for i in range(len(subvalue))])
    if isinstance(subvalue, dict):
        if not isinstance(supervalue, dict):
            return False
        for key in subvalue:
            if key not in supervalue or not is_subvalue(supervalue[key], subvalue[key]):
                return False
        return True
    # all other types.
    return supervalue == subvalue

Here's some tests for it:

assert is_subvalue(None, None)
assert is_subvalue(1, 1)
assert is_subvalue('1', '1')
assert is_subvalue([], [])
assert is_subvalue({}, {})
assert is_subvalue(['1'], ['1'])
assert not is_subvalue(['1'], ['1', '2']) and not is_subvalue(['1', '2'], ['1'])
assert is_subvalue({'a': 'b'}, {'a': 'b'})
assert not is_subvalue({'ab': 'b'}, {'a': 'b'})
assert is_subvalue({'a': ['b']}, {'a': ['b']})
assert is_subvalue({'a': ['b', 'c']}, {'a': ['b', 'c']})
# tests for ensuring more complex dictionary checks work
assert not is_subvalue({'a': ['b', 'c']}, {'a': ['b']})
assert not is_subvalue({'a': 'b'}, {'a': 'b', 'b': 'c'})
assert is_subvalue({'a': 'b', 'b': 'c', 'c': 'd'}, {'a': 'b', 'b': 'c'})
assert not is_subvalue({'a': 'b', 'b': 'c', 'c': 'd'}, {'a': 'b', 'b': {'c': 'c'}})
assert is_subvalue({'a': 'b', 'b': {'c': 'c'}, 'c': 'd'}, {'a': 'b', 'b': {'c': 'c'}})
assert is_subvalue({'a': 'b', 'b': ['c', 'c'], 'c': 'd'}, {'a': 'b', 'b': ['c', 'c']})
Jacob C.
  • 1
  • 1