Remove duplicates key from list of dictionaries python

Question

I am trying to remove the duplicates from following list:

distinct_cur = [
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 195, 'st': 0.0, 'htc': 2, '_id': ObjectId('58e86a550a0aeff4e14ca6bb'), 'ftc': 0}, 
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 454, 'st': 0.8, 'htc': 1, '_id': ObjectId('58e8d03958ae6d179c2b4413'), 'ftc': 1},
    {'rtc': 0, 'vf': 2, 'mtc': 1, 'doc': 'test', 'foc': 45, 'st': 0.8, 'htc': 12, '_id': ObjectId('58e8d03958ae6d180c2b4446'), 'ftc': 0}
]

Of dictionaries based on condition that if 'doc' key value text is same then one of the dictionary should be removed. I have tried the following solution:

distinct_cur = [dict(y) for y in set(tuple(x.items()) for x in cur)]

But duplicates are still present in the final list.

Below is the desired output as in 1st and 2nd distinct_cur text of key 'doc' value is same (good job):

[
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 195, 'st': 0.0, 'htc': 2, '_id': ObjectId('58e86a550a0aeff4e14ca6bb'), 'ftc': 0}, 
    {'rtc': 0, 'vf': 2, 'mtc': 1, 'doc': 'test', 'foc': 45, 'st': 0.8, 'htc': 12, '_id': ObjectId('58e8d03958ae6d180c2b4446'), 'ftc': 0}
]

You're not allowed to use duplicate keys in dictionary. What you mean by removing duplicate key? what should be removed? — Mazdak, Apr 10 '17 at 09:28
So after you find dictionaries with same `doc` key, how you decide which one should be removed? — Mazdak, Apr 10 '17 at 09:30
Here are some similar questions http://stackoverflow.com/questions/15511903/remove-duplicates-from-a-list-of-dictionaries-when-only-one-of-the-key-values-is and http://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python — Mazdak, Apr 10 '17 at 09:32

Jean-François Fabre · Accepted Answer · 2023-04-04T20:49:58.243

You're creating a set out of different elements and expect that it will remove the duplicates based on a criterion that only you know.

You have to iterate through your list, and add to the result list only if doc has a different value than the previous ones: for instance like this:

done = set()
result = []
for d in distinct_cur:
    if d['doc'] not in done:
        done.add(d['doc'])  # note it down for further iterations
        result.append(d)

that will keep only the first occurrence(s) of the dictionaries which have the same doc key by registering the known keys in an auxiliary set.

Another possibility is to use a dictionary with the key as the "doc" key of the dictionary, iterating backwards in the list so the first items overwrite the last ones in the list:

result = list({i['doc']:i for i in reversed(distinct_cur)}.values())

The `result` object in the second solution will be of type `dict_values`. You need to convert it to `list` type using `list()` function. — Junye Huang, Apr 04 '23 at 16:50

score 5 · Answer 2 · answered Apr 10 '17 at 09:33

5

I see 2 similar solutions that depend on your domain problem: do you want to keep the first instance of a key or the last instance?

Using the last (so as to overwrite the previous matches) is simpler:

d = {r['doc']: r for r in distinct_cur}.values()

answered Apr 10 '17 at 09:33

smassey

5,875
24
37

score 3 · Answer 3 · answered Feb 14 '20 at 07:31

3

One liner to deduplicate the list of dictionaries distinct_cur on the primary_key of doc

[i for n, i in enumerate(distinct_cur) if i.get('doc') not in [y.get('doc') for y in distinct_cur[n + 1:]]]

answered Feb 14 '20 at 07:31

Alec

164
1
5

score 1 · Answer 4 · answered Apr 10 '17 at 09:31

1

Try this:

distinct_cur  =[dict(t) for t in set([tuple(d.items()) for d in distinct_cur])]

Worked for me...

answered Apr 10 '17 at 09:31

Yuval Pruss

8,716
15
42
67

Remove duplicates key from list of dictionaries python

4 Answers4

Linked

Related