I have a list of dictionaries with the keys street
, number
and some_flag
.
My goal is to search the dicts for duplicates in the keys street
and number
. If for two or more dicts these two key/value pairs are identical, I want to assign the value 1 to their some_flag
key.
Please see reproducible example below.
Starting list of dictionaries:
a = [
{'street': 'ocean drive', 'number': '1', 'some_flag': 0},
{'street': 'ocean drive', 'number': '3', 'some_flag': 0},
{'street': 'ocean drive', 'number': '4', 'some_flag': 0}, # duplicate street / number keys
{'street': 'ocean drive', 'number': '4', 'some_flag': 0}, # duplicate street / number keys
{'street': 'apple tree rd.', 'number': '3', 'some_flag': 0},
]
Expected output:
a_checked = [
{'street': 'ocean drive', 'number': '1', 'some_flag': 0},
{'street': 'ocean drive', 'number': '3', 'some_flag': 0},
{'street': 'ocean drive', 'number': '4', 'some_flag': 1}, # duplicate street / number keys
{'street': 'ocean drive', 'number': '4', 'some_flag': 1}, # duplicate street / number keys
{'street': 'apple tree rd.', 'number': '3', 'some_flag': 0},
]
My best effort:
The code I've got so far is derived from Aarons answer (here) and the community wiki's answer (here)
from collections import defaultdict, Counter
items = defaultdict(list) # create defaultdict
for row in a:
items[row['street']].append(row['number']) # make a list of 'number' values for each 'street' key
for key in items.keys():
if checkIfDuplicates(items[key]): #if there is more than one 'number' --> function definition see below
duplicate_dict = {}
duplicate_dict['numbers'] = [item for item, count in Counter(items[key]).items() if count > 1] # storing duplicate numbers in dict
duplicate_dict['street'] = key # storing street name in same dict
Function to check if given list contains any duplicates (from here):
def checkIfDuplicates(listOfElems):
if len(listOfElems) == len(set(listOfElems)):
return False
else:
return True
current output:
print(duplicate_dict)
{'numbers': ['4'], 'street': 'ocean drive'}
With my approach, I would now have to match the duplicate_dict
with the original list a
, which doesn't seem very efficient.
Are there more direct ways to solve this problem?