I was looking for a set()
-like method to deduplicate a list, except that the items figuring in the original list are not hashable (they are dict
s).
I spent a while looking for something adequate, and I ended up writing this little function:
def deduplicate_list(lst, key):
output = []
keys = []
for i in lst:
if not i[key] in keys:
output.append(i)
keys.append(i[key])
return output
Provided that a key
is correctly given and is a string
, this function does its job pretty well. Needless to say, if I learn about a built-in or a standard library module which allows the same functionality, I'll happily drop my little routine in favor of a more standard and robust choice.
Are you aware of such implementation?
-- Note
The following one-liner found from this answer,
[dict(t) for t in set([tuple(d.items()) for d in l])]
while clever, won't work because I have to work with items as nested dict
s.
-- Example
For clarity purposes, here is an example of using such a routine:
with_duplicates = [
{
"type": "users",
"attributes": {
"first-name": "John",
"email": "john.smith@gmail.com",
"last-name": "Smith",
"handle": "jsmith"
},
"id": "1234"
},
{
"type": "users",
"attributes": {
"first-name": "John",
"email": "john.smith@gmail.com",
"last-name": "Smith",
"handle": "jsmith"
},
"id": "1234"
}
]
without_duplicates = deduplicate_list(with_duplicates, key='id')