2

Pulling my hair out with this one.

I have a list of dictionaries without a unique primary ID key for each unique entry (the dictionary is built on the fly):

dicts = [{'firstname': 'john', 'lastname': 'doe', 'code': 'crumpets'},
         {'firstname': 'john', 'lastname': 'roe', 'code': 'roe'},
         {'firstname': 'john', 'lastname': 'doe', 'code': 'crumpets'},
         {'firstname': 'thom', 'lastname': 'doe', 'code': 'crumpets'},
]

How do I go about filtering out lists of dictionaries like this where any repeating {} within the list are removed? So I need to check if all three of the dictionary keys match up with another in the list...and then discard that from the dict if that check is met.

So, for my example above, the first and third "entries" need to be removed as they are duplicates.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555

3 Answers3

5

You use create frozensets from the dicts and put those in a set to remove dupes:

dcts = [dict(d) for d in set(frozenset(d.items()) for d in dcts)]
print(dcts)

[{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
 {'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'},
 {'code': 'crumpets', 'firstname': 'john', 'lastname': 'doe'}]

If you choose to remove all entries of the duplicates you can use a counter:

from collections import Counter

dcts = [dict(d) for d, cnt in Counter(frozenset(d.items()) for d in dcts).items() 
                                                                      if cnt==1]
print(dcts)

[{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
 {'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'}]
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
  • Will see if I can implement this - but not sure I understand how you can define a dictionary on the fly and then cross check it. Will need to implement this and do some more reading I think. Thank you. –  Jun 19 '17 at 10:22
  • Is there any reason other than onelining this, to not use simple for loop with new container like this: `new_ct = []` and `for item in ct: if item not in new_ct: new_ct.append(item)` ? – Sardorbek Imomaliev Jun 19 '17 at 10:22
  • @MichaelRoberts First part of his answer does what you want `dcts = [dict(d) for d in set(frozenset(d.items()) for d in dcts)]` – Sardorbek Imomaliev Jun 19 '17 at 10:23
  • @SardorbekImomaliev Apologies, yes that is correct. I was in the middle of editing my comment when I realised. –  Jun 19 '17 at 10:25
  • To expand upon my above comment, surely dcts is referenced before an assignment? –  Jun 19 '17 at 10:28
  • @MichaelRoberts `dcts` is your list of dicts it is called `dicts` in your question – Sardorbek Imomaliev Jun 19 '17 at 10:28
  • I see. Yes this works - I should probably have given a different variable name for my dict. Accepted answer - very elegantly done in one line. –  Jun 19 '17 at 10:31
2

Remove duplicates in a list of non-hashable elements requires you to make them hashable on the fly:

def remove_duplicated_dicts(elements):
    seen = set()
    result = []
    for element in elements:
        element_as_tuple = tuple(element.items())
        if element_as_tuple not in seen:
            seen.add(element_as_tuple)
            result.append(element)
    return result

d = [{'firstname': 'john', 'lastname': 'doe', 'code': "crumpets"},
        {'firstname': 'john', 'lastname': 'roe', 'code': "roe"},
        {'firstname': 'john', 'lastname': 'doe', 'code': "crumpets"},
        {'firstname': 'thom', 'lastname': 'doe', 'code': "crumpets"},
]

print(remove_duplicated_dicts(d))

PS.

Non-obvious differences with the accepted answer of Moses Koledoye (as of 2017-06-19 at 13:00:00):

  • preservation of the original list order;
  • faster conversions: dict -> tuple instead of dict -> frozendict -> dict (take it with a grain of salt: I have made no benchmark).
Aristide
  • 3,606
  • 2
  • 30
  • 50
  • Thank you very much for this answer - although I am very grateful for your answer I have accepted a higher voted answer (I was also working within a function with Django so I didn't want to nest a base dictionary sorting method above Django based methods). But thank you, nonetheless. –  Jun 19 '17 at 10:32
1

Given the values of the dictionary are hashable, we can generate our own uniqness filter:

def uniq(iterable, key = lambda x:x):
    keys = set()
    for item in iterable:
        ky = key(item)
        if ky not in keys:
            yield item
            keys.add(ky)

We can then simply use the filter, like:

list(uniq(dicts,key=lambda x:(x['firstname'],x['lastname'],x['code'])))

The filter maintains the original order, and will - for this example - generate:

>>> list(uniq(dicts,key=lambda x:(x['firstname'],x['lastname'],x['code'])))
[{'code': 'crumpets', 'firstname': 'john', 'lastname': 'doe'},
 {'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
 {'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'}]
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • 1
    Thank you very much for this answer - although I am very grateful for your answer I have accepted a higher voted answer (I was also working within a function with Django so I didn't want to nest a base dictionary sorting method above Django based methods). But thank you, nonetheless. –  Jun 19 '17 at 10:33