1

I would like to get the difference of two lists which have the following structure:

first_dict = [{"a": "abcd","b":"defg", "c":["fng","xvg"]}, {"a": "stg","b":"klsm", "c":["xzy"]}]

second_dict = [{"a": "abcd","b":"defg", "c":["fng","xvg"]}]

I've tried to freeze the sets as follows:

i_set = { frozenset(row.items()) for row in first_dict }
a_set = { frozenset(row.items())  for row in second_dict }
result = [dict(i) for i in i_set - a_set]

Expected result:

v = {"a": "stg","b":"klsm", "c":["xzy"]}

Because "c":[] is a list I get the following error:

TypeError: unhashable type: 'list'

I've had a look at similar answers here and here but they don't work for my problem.

Daniela
  • 861
  • 5
  • 11
  • 28

2 Answers2

1

Since you used set and frozenset I assume you care about efficiency and don't want an O(n**2) algorithm.

In that case you don't just need to convert the dictionaries to frozensets, but also convert the lists to an hashable class. You could use tuples.

Assuming your inputs:

first_dict = [{"a": "abcd","b":"defg", "c":["fng","xvg"]}, {"a": "stg","b":"klsm", "c":["xzy"]}]

second_dict = [{"a": "abcd","b":"defg", "c":["fng","xvg"]}]

You could use:

def convert(dictionary):
    return frozenset((key, tuple(value) if isinstance(value, list) else value) for key, value in dictionary.items())

def convert_back(frozset):
    return dict((key, list(value) if isinstance(value, tuple) else value) for key, value in frozset)

i_set = { convert(row) for row in first_dict }
a_set = { convert(row) for row in second_dict }
result = [convert_back(i) for i in i_set - a_set]

Then the result would be:

[{'a': 'stg', 'c': ['xzy'], 'b': 'klsm'}]

However this won't discriminate between tuples that have been tuples in the original and tuples that have been lists in the original.


You could solve that (if you're sure that the dictionary and its contents are not mutated during the operations) by wrapping the dictionaries in your own hashable class. This has the advantage that you don't need to convert them back afterwards, you can simply "unwrap" the value.

class HashableDictionaryWithListValues:
    def __init__(self, dictionary):
        converted = frozenset((key, tuple(value) if isinstance(value, list) else value) for key, value in dictionary.items())
        self._hash = hash(converted)
        self._converted = converted
        self.dictionary = dictionary

    def __hash__(self):
        return self._hash

    def __eq__(self, other):
        return self._converted == other._converted

i_set = { HashableDictionaryWithListValues(row) for row in first_dict }
a_set = { HashableDictionaryWithListValues(row) for row in second_dict }
result = [i.dictionary for i in i_set - a_set]

The result will be:

[{'a': 'stg', 'b': 'klsm', 'c': ['xzy']}]

Although you could also avoid the whole conversion/wrapping and do an O(n**2) approach:

def difference(first, second):
    for item in first:
        if item not in second:
            yield item

list(difference(first_dict, second_dict))

Which also gives the expected result:

[{'a': 'stg', 'b': 'klsm', 'c': ['xzy']}]
MSeifert
  • 145,886
  • 38
  • 333
  • 352
0

Why frozenset?, just use:

>>> first_dict = [{"a": "abcd","b":"defg", "c":["fng","xvg"]}, {"a": "stg","b":"klsm", "c":["xzy"]}]
>>> second_dict = [{"a": "abcd","b":"defg", "c":["fng","xvg"]}]
>>> [i for i in first_dict if i not in second_dict]
[{'a': 'stg', 'b': 'klsm', 'c': ['xzy']}]
>>> 

Or if you want one singular, change:

>>> [i for i in first_dict if i not in second_dict]

To:

>>> [i for i in first_dict if i not in second_dict][0]

Which outputs:

{'a': 'stg', 'b': 'klsm', 'c': ['xzy']}
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • "Why frozenset?" Probably because of performance. Your approach will scale quadratic (because `in` is `O(n)` and it's done `n` times it will be `O(n**2)`). While set-difference is `O(n)`. – MSeifert Jun 02 '19 at 11:29
  • Thanks @U9-Forward it's simple but it has some performancve concerns. – Daniela Jun 02 '19 at 15:05