1

I'm trying to build some efficient code that can tell if one dict is a subset of another. Both dicts have string keys and int values. For dict1 to be considered a subset, it can not contain any unique keys and all values must be less than or equal to the equivalent key's value in dict2.

This almost worked: test_dict.items() <= test_dict2.items() until I tested it here:

test_dict = {
    'a':1,
    'c':2
}

test_dict2 = {
    'a':1,
    'b':2,
    'c':3
}

test_dict.items() <= test_dict2.items()

False

I did get something working, but I dont know how efficient it really is

def test(request_totals, mongo_totals, max_limit=100):
    outdated = dict()
    
    sharedKeys = set(request_totals.keys()).intersection(mongo_totals.keys())
    unsharedKeys = set(request_totals) - set(mongo_totals)
    
    # Verifies MongoDB contains no unique collections
    if set(mongo_totals) - set(request_totals) != set():
        raise AttributeError(f'''mongo_totals does not appear to be a subset of request_totals. 
                            Found: {set(mongo_totals) - set(request_totals)}''')
    
    # Updates outdated dict with outdated key-value pairs representing MongoDB collections
    for key in sharedKeys:
        if request_totals[key] > mongo_totals[key]:
            outdated.update({key : range(mongo_totals[key], request_totals[key])})
        elif request_totals[key] < mongo_totals[key]:
            raise AttributeError(
                f'mongo_total for {key}: {mongo_totals[key]} exceeds request_totals for {key}: {request_totals[key]}')
    
    return outdated

test(request_totals, mongo_totals)

It seems like a lot to do my comparison before creating an object that manages updates. Is there a better way to do this?

MrChadMWood
  • 113
  • 11
  • Some possible solutions [here](https://stackoverflow.com/questions/9323749/how-to-check-if-one-dictionary-is-a-subset-of-another-larger-dictionary) that may or may not meet your exact needs. – sj95126 Aug 05 '22 at 20:42

2 Answers2

2
all(test_dict2.get(k, v-1) >= v
    for k, v in test_dict.items())

Try it online!

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
0

You could try the collections Counter - it's very efficient and clear. Note - it's a new feature available in Python 3.10.

It's dict subclass for counting hashable objects


from collections import Counter
cd1 = Counter(test_dict)
cd2 = Counter(test_dict2)
print(cd1 <= cd2)
# True
#
# another example:
cd3 = Counter({'a': 2, 'b': 2, 'c': 3})
print(cd3 <= cd2)
#False
print(cd2 <= cd3)
#True
Daniel Hao
  • 4,922
  • 3
  • 10
  • 23
  • Thanks for your reply! What version of python are you using? For me, this returns `TypeError: '<=' not supported between instances of 'Counter' and 'Counter'` I'm on 3.9 – MrChadMWood Aug 05 '22 at 20:51
  • I would have loved to upgrade, but I was reading there's no support for BeautifulSoup just yet. Now that I think about it though, I didn't check if BS4 was available... whoops. Have you noticed any good reason to stay at 3.9 for production? – MrChadMWood Aug 05 '22 at 20:59
  • Py 3.10 has been around about 1 year (Oct, 2021) - there's no reason not to upgrade... but some place/project might be slower. Does this post helps you? – Daniel Hao Aug 05 '22 at 21:05
  • 1
    Good point. I was going to wait until 3.11 official release and just stay 1 version behind, but maybe I'll go ahead and upgrade over the weekend. Thanks for answering my questions! – MrChadMWood Aug 05 '22 at 21:06
  • Yes, I'm accepting this as the answer. Your solution appears to be valid as long as I'm utilizing current software, and I did not request code be compatible with older versions. So it makes sense to me that I should accept your answer. I'll upgrade as well. Thanks again. – MrChadMWood Aug 05 '22 at 21:09
  • 1
    They didn't downvote mine, so maybe it's about efficiency? (I suspect yours is faster if it is a subset but mine might be much faster if it isn't). Btw in <3.10, you could subtract and check if something is left over. – Kelly Bundy Aug 06 '22 at 11:13
  • Thanks for the feedback. I will check out the *timing* if I got some time this weekend... But again *premature optimization is evil...* Should be concerned to make it work first...right? – Daniel Hao Aug 06 '22 at 11:16
  • Yeah, I don't really see it as an issue, I just don't see any other issue with it, either. – Kelly Bundy Aug 06 '22 at 11:18