Summary: I have a list of dictionaries, some of the elements being "duplicated". How can I define "duplicate" and how can I influence which element stays?
Detailed question:
There is a great answer to a stack overflow question on how to remove duplicates from a list of dictionaries:
l = [
{'name': 'john', 'age': 20},
{'name': 'john', 'age': 20},
{'name': 'mark', 'age': 21},
{'name': 'john', 'age': 99},
{'name': 'john'}
]
print [dict(tupleized) for tupleized in set(tuple(item.items()) for item in l)]
I would like, however, to control the definition of "duplicate". For instance let's say that any of the dictionaries which has the same value for 'name'
are "duplicates" to me. The expected output would be two elements only (one entry for 'john'
and one for 'mark'
).
I would also like to control which of the "duplicates" is retained. For instance only the one with the highest 'age'
for a given 'name'
. The pruned down list would therefore be
[{'age': 99, 'name': 'john'}, {'age': 21, 'name': 'mark'}]
How can I do this in a clever and pythonic way? (my current idea is to go for a loop over the key 'name'
and set a flag (on the age in the case above) to copy the most relevant entry into a new list -- but I was hoping for something more elegant)