In the code below, the 3rd and 4th elements are considered the same, because 'start' and 'end' are just switched:
{'start': '222', 'end': '333', 'type':'c'},
{'start': '333', 'end': '222', 'type':'c'}
I need to build a relations list or set which don't contain duplicates like above. Supposed the input is from list_of_dicts, and my code is the following to achieve the purpose:
relations = []
list_of_dicts = [{'start': '123', 'end': '456', 'type':'a'},
{'start': '111', 'end': '122', 'type':'b'},
{'start': '222', 'end': '333', 'type':'c'},
{'start': '333', 'end': '222', 'type':'c'},
]
duplicate_keys = set()
for my_dict in list_of_dicts:
duplicate_key = ''.join(sorted(my_dict['start'] + my_dict['end'] + my_dict['type']))
if duplicate_key not in duplicate_keys:
relations.append(my_dict)
duplicate_keys.add(duplicate_key)
print(relations)
This seems to work. My list_of_dicts are supposed to be large, for example, 100 millions. Is this the fast way to do it? Also, the list_of_dicts here are illustrative purpose for convenience, but the 'relations' list are built from similar input.