6

I am trying to find a way to sort and compare two lists of dictionaries in Python 3.6. I ultimately just want list_dict_a and list_dict_b to compare with == and evaluate to True.

Here is what the data looks like:

list_dict_a = [
{'expiration_date': None, 'identifier_country': None, 'identifier_number': 'Male', 'identifier_type': 'Gender', 'issue_date': None},
{'expiration_date': None, 'identifier_country': 'VE', 'identifier_number': '1234567', 'identifier_type': 'Foo No.', 'issue_date': None}]

list_dict_b = [
{'identifier_country': 'VE', 'expiration_date': None, 'identifier_type': 'Foo No.', 'issue_date': None, 'identifier_number': '1234567'},
{'identifier_country': None, 'expiration_date': None, 'identifier_type': 'Gender', 'issue_date': None, 'identifier_number': 'Male'}]

The data is the same, but it comes in different orders (I dont have any control over the initial order).

When I try to compare them as such, I get a false value when doing something like this: print("does this match anything",list_dict_a == list_dict_b)

Is this even possible to do?

unseen_damage
  • 1,346
  • 1
  • 14
  • 32

3 Answers3

2

You can sort both lists before comparing them and compare the sorted results:

>>> list_dict_a = [
        {'expiration_date': None, 'identifier_country': None, 'identifier_number': 'Male', 'identifier_type': 'Gender', 'issue_date': None},
        {'expiration_date': None, 'identifier_country': 'VE', 'identifier_number': '1234567', 'identifier_type': 'Foo No.', 'issue_date': None}]

>>> list_dict_b = [
        {'identifier_country': 'VE', 'expiration_date': None, 'identifier_type': 'Foo No.', 'issue_date': None, 'identifier_number': '1234567'},
        {'identifier_country': None, 'expiration_date': None, 'identifier_type': 'Gender', 'issue_date': None, 'identifier_number': 'Male'}]

>>> list_dict_a == list_dict_b
False
>>> def key_func(d):
        items = ((k, v if v is not None else '') for k, v in d.items())
        return sorted(items)
>>> sorted(list_dict_a, key=key_func) == sorted(list_dict_b, key=key_func)
True

The order of the dicts within each list will then not matter.

Passing the key function is needed, because dicts are not orderable, thus we need to tell the sorting function what key to use for each pair of dict objects when comparing them. A key for each dictionary is simply a sorted list of its (key, value) pairs.

The key function calculates a key for each dict as follows:

>>> dict_a0 = list_dict_a[0]
>>> key_func(dict_a0)
[('expiration_date', ''), ('identifier_country', ''), ('identifier_number', 'Male'), ('identifier_type', 'Gender'), ('issue_date', '')]

Footnotes

In order for this list of (key, value) pairs to be comparable with other dicts' lists, None values had to be converted to an empty string. This allows None values to be comparable with other non-None values.

The underlying assumption in the solution above is that all dictionary values in your case are either strings or None, and that "empty" values are consistently represented as None (and not e.g. by an empty string). If this is not the case, key_func() would have to be adjusted accordingly to assure that the resulting lists are always comparable to each other for any dict value expected in the data.

Also, for large dicts this key function might not be ideal, because comparisons of key pairs would be too slow. It would thus be better to instead calculate a unique hash value for each dict (but the same hash for dicts that compare equal).

plamut
  • 3,085
  • 10
  • 29
  • 40
  • How would I go about calculating a hash for each dict? I will try out the solution and update accordingly – unseen_damage Dec 21 '17 at 14:42
  • One idea would be to convert the list computed as the dict's key to a string and hash it. This will work as long as the same values in the dict have the same string representations - which seems to be the case with your data that only contains strings and `None`. If this does not suffice, you might also want to check [this answer](https://stackoverflow.com/a/8714242/5040035) suggesting a much more advanced dict hashing function. – plamut Dec 22 '17 at 00:45
0

you can also check if each dict in list_dict_a is in list_dict_b

all([dict_a in list_dict_b for dict_a in list_dict_a])

Out[218]: True
f5r5e5d
  • 3,656
  • 3
  • 14
  • 18
  • 1
    Just please mind that this is an O(n^2) solution ... or O(a*b) where a and b are the lengths of list a and b, respectively. The difference can be considerable for larger datasets. – plamut Dec 20 '17 at 19:55
  • Note that this is comparing if all `dict` elements in `list_dict_a` are in `list_dict_b`, but if there are more dict elements in `list_dict_b` that sentence returns `True` too. So you must add one more line: `len(list_dict_a) == len(list_dict_b)`. – Caco Aug 30 '18 at 11:58
0

You can try this:

list_dict_a = [
{'expiration_date': None, 'identifier_country': None, 'identifier_number': 'Male', 'identifier_type': 'Gender', 'issue_date': None},
{'expiration_date': None, 'identifier_country': 'VE', 'identifier_number':  '1234567', 'identifier_type': 'Foo No.', 'issue_date': None}]

list_dict_b = [
{'identifier_country': 'VE', 'expiration_date': None, 'identifier_type': 'Foo No.', 'issue_date': None, 'identifier_number': '1234567'},
{'identifier_country': None, 'expiration_date': None, 'identifier_type': 'Gender', 'issue_date': None, 'identifier_number': 'Male'}]
new_list = sorted(list_dict_a, key=lambda x:x['identifier_country'] is not None, reverse=True)
print(new_list == list_dict_b)

Output:

True

If you do not know the key originally, you can try this:

new_list = sorted(list_dict_a, key=lambda x:x.get('identifier_country', None) is not None, reverse=True)
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • If I didn't know the key, would I be able to do `key=lambda x:x[0]` instead? – unseen_damage Dec 21 '17 at 14:43
  • @unseen_damage no, because in the scope of the lambda function, `x` is a dictionary, and `x[0]` will raise a `KeyError`. You can, however, use `dict.get` and provide a default argment. Please see my recent edit. – Ajax1234 Dec 21 '17 at 14:47