How to delete duplicated dictionary objects from a List of dictionaries

Question

I want to delete duplicated dictionary objects from a List of dictionaries. I don't want the dict element that has the same 'plate' element with another dict element in the list. I want it only once.

datalist = [

{
    'plate': "01",
    'confidence' : "80"
},

{
    'plate': "01",
    'confidence' : "60"
},

{
    'plate': "02",
    'confidence' : "91"
},

{
    'plate': "02",
    'confidence' : "91"
},
]

My output should be like this:

datalist = [

{
    'plate': "01",
    'confidence' : "80"
},

{
    'plate': "02",
    'confidence' : "91"
},
]

This is my code, but I'm not getting the exact result.

def filter(datalist):
    previous = ""
    for data in datalist:
        current  = data['plate']
        if current is previous:
            datalist.remove(data)
        previous = current 

    return datalist

datalist = [

    {
        'plate': "01",
        'confidence' : "80"
    },

    {
        'plate': "01",
        'confidence' : "60"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },
]


print (filter(datalist))

This gives me the output:

[

    {
        'plate': "01",
        'confidence' : "80"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },
]

which is not expected, what's wrong with my code.

Related, not exact duplicate as here we only want to consider one key when considering duplicates: [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) — jpp, Jan 04 '19 at 13:04
you can also use pandas import pandas as pd; df = pd.DataFrame(data = datalist); df.drop_duplicates(subset = ['plate'],keep='first',inplace=True); output = df.to_dict(orient='record') — LMSharma, Jan 04 '19 at 13:08

Dani Mesejo · Accepted Answer · 2019-01-04T13:15:02.027

6

If any element from the groups of duplicates is acceptable, you could do:

datalist = [
    {'plate': "01", 'confidence': "80"},
    {'plate': "01", 'confidence': "60"},
    {'plate': "02", 'confidence': "91"},
    {'plate': "02", 'confidence': "91"},
]

result = list({ d['plate'] : d for d in datalist }.values())
print(result)

Output

[{'plate': '02', 'confidence': '91'}, {'plate': '01', 'confidence': '60'}]

The idea is to create a dictionary where the keys are values of plate and the values are the dictionaries themselves. If you want to keep the first duplicate entries use reversed:

result = list({d['plate']: d for d in reversed(datalist)}.values())

Output

[{'plate': '02', 'confidence': '91'}, {'plate': '01', 'confidence': '80'}]

edited Jan 04 '19 at 13:15

answered Jan 04 '19 at 12:58

Dani Mesejo

61,499
6
49
76

1

This produces the wrong result because the last duplicated entries are kept, not the first. – timgeb Jan 04 '19 at 13:08
1

@timgeb OP didn't clearly specify that they want the first entry - only that they don't want duplicate entries. – Matthias Fischer Jan 04 '19 at 13:09
1

@MatthiasFischer OP specified the desired output. – timgeb Jan 04 '19 at 13:10
you can still do this by replacing `datalist` with `datalist[::-1]` or `reversed(datalist)` for something more loop-efficient. – Ma0 Jan 04 '19 at 13:10
@timgeb For the first example, but we don't know if they always want the first entry. You're probably right and I'm being pedantic, but it's not explicitly stated. :) – Matthias Fischer Jan 04 '19 at 13:14
1

@timgeb Updated the answer! – Dani Mesejo Jan 04 '19 at 13:15
@DanielMesejo Thank you so much for the quick response. It works the way I wanted. The only piece of puzzle left was to preserve the order, and for that I added another line of code, `results_sorted = sorted(results, key=lambda k: k['epoch_time'])` Fortunately I got the "Epoch Time" in the list , which and others few keys I removed for the clarity of the question. And I sorted the dictionaries with "epoch time" and it is fine now. – Khaalidi Jan 05 '19 at 19:42

RoadRunner · Answer 2 · 2019-01-04T13:18:43.687

Assuming you want to only keep the first duplicated dict found, You can use setdefault():

datalist = [
    {"plate": "01", "confidence": "80"},
    {"plate": "01", "confidence": "60"},
    {"plate": "02", "confidence": "91"},
    {"plate": "02", "confidence": "91"},
]

result = {}
for d in datalist:
    result.setdefault(d["plate"], d)

print(list(result.values()))
# [{'plate': '01', 'confidence': '80'}, {'plate': '02', 'confidence': '91'}]

If you instead want the last duplicates, simply iterate in reverse().

score 3 · Answer 3 · answered Jan 04 '19 at 13:09

3

You can use the unique_everseen recipe, also available in 3rd party more_itertools:

from more_itertools import unique_everseen
from operator import itemgetter    

datalist = list(unique_everseen(datalist, key=itemgetter('plate')))

Internally, this solution uses set to keep track of seen plates, yielding only dictionaries with new plate values. Therefore, ordering is maintained and only the first instance of any given plate is kept.

answered Jan 04 '19 at 13:09

jpp

159,742
34
281
339

2

Nice use of `itemgetter()` here. – RoadRunner Jan 04 '19 at 13:16
2

Good answer for the uninitiated, like me. Although, I prefer pandas as more simple method. – Jacob Fuchs Jan 04 '19 at 13:57

timgeb · Answer 4 · 2019-01-04T13:16:53.277

If you are a pandas user, you can consider

>>> import pandas as pd
>>> datalist = [{'plate': "01", 'confidence': "80"}, {'plate': "01", 'confidence': "60"}, {'plate': "02", 'confidence': "91"}, {'plate': "02", 'confidence': "91"}]
>>> pd.DataFrame(datalist).drop_duplicates('plate').to_dict(orient='records')                                                                               
[{'confidence': '80', 'plate': '01'}, {'confidence': '91', 'plate': '02'}]

If you want to keep the last seen duplicates, pass keep='last'.

>>> pd.DataFrame(datalist).drop_duplicates('plate', keep='last').to_dict(orient='records')
[{'confidence': '60', 'plate': '01'}, {'confidence': '91', 'plate': '02'}]

score 3 · Answer 5 · answered Jan 04 '19 at 13:11

3

you can also use pandas

import pandas as pd
df = pd.DataFrame(data = datalist)
df.drop_duplicates(subset = ['plate'],keep='first',inplace=True)
output = df.to_dict(orient='record')

keep = 'first' or 'last' will help in which entry to keep in output

answered Jan 04 '19 at 13:11

LMSharma

279
3
10

Huh, I did not know that you could pass `'record'` instead of `'records'`. This seems to be undocumented. +1 for obscure knowledge :) – timgeb Jan 04 '19 at 13:23
1

Update: seems like you can pass `'r'`, `'re'`, ..., `'records'`, `'recordsasdf'`, ... – timgeb Jan 04 '19 at 13:25
1

:D Thanks @timgeb . We can use anything that starts with 'r' . I have just checked the source code. http://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/frame.py#L987-L1102 – LMSharma Jan 04 '19 at 14:12

hamza tuna · Answer 6 · 2019-01-04T13:19:15.867

2

You can use one groupby:

list(map(lambda x: next(x[1]), groupby(sorted(datalist, key=lambda d: d['plate']), lambda d: d['plate'])))

Results:

[{'plate': '01', 'confidence': '80'}, {'plate': '02', 'confidence': '91'}]

edited Jan 04 '19 at 13:19

answered Jan 04 '19 at 13:11

hamza tuna

1,467
1
12
17

this requires the initial list to be sorted. – Ma0 Jan 04 '19 at 13:12

score 2 · Answer 7 · answered Jan 04 '19 at 13:48

Good old verbose for loop, then list comprehension:

tmp=[]
for dct in datalist:
  if not any(e[0] == dct["plate"] for e in tmp):
    tmp.append((dct["plate"], dct["confidence"]))


[ {"plate": plate, "confidence": confidence} for plate, confidence in tmp ]
#=> [{'plate': '01', 'confidence': '80'}, {'plate': '02', 'confidence': '91'}]

How to delete duplicated dictionary objects from a List of dictionaries

7 Answers7