1
elements = [
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3},
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},
{'v' : [1,2,3]}
]

Given above list of dict in Python, how to deduplicate to the following collection(order doesn't matter) efficiently

result = [
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3},
{'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},
{'v' : [1,2,3]}
]

The naive method is to use set, however dict in Python is unhashable. Right now, my solution is to serialize dict to String like json format (since dict has no order, two different strings can correspond to same dict. I have to keep some order). However this method has too high time complexity.

My Questions:

  1. How to efficiently deduplicate dictionary in Python?

  2. More generally, is there any method to override a class's hashCode like Java to use set or dict?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
maplemaple
  • 1,297
  • 6
  • 24

1 Answers1

0

For your toy example with few data you can use the repr of the inner dictionaries as key for a new dictionary, then collect all the values:

elements = [{'a' : 1, 'b' : 2, 'c': 3},             {'a' : 2, 'b' : 2, 'c': 3},
            {'a' : 2, 'b' : 3, 'c': 3},             {'a' : 1, 'b' : 2, 'c': 3},
            {'a' : 2, 'b' : 2, 'c': 3},             {'a' : 2, 'b' : 2},
            {'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},    {'v' : [1,2,3]}]

kv = {repr(inner):inner for inner in elements}
elements = list(kv.values())

print(elements)

Output:

[{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 3}, 
 {'a': 2, 'b': 2}, {'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'v': [1, 2, 3]}]

If you check the id() of your inner dictionaries you'll see the last one survives.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69