6

I want to delete duplicated dictionary objects from a List of dictionaries. I don't want the dict element that has the same 'plate' element with another dict element in the list. I want it only once.

datalist = [

{
    'plate': "01",
    'confidence' : "80"
},

{
    'plate': "01",
    'confidence' : "60"
},

{
    'plate': "02",
    'confidence' : "91"
},

{
    'plate': "02",
    'confidence' : "91"
},
]

My output should be like this:

datalist = [

{
    'plate': "01",
    'confidence' : "80"
},

{
    'plate': "02",
    'confidence' : "91"
},
]

This is my code, but I'm not getting the exact result.

def filter(datalist):
    previous = ""
    for data in datalist:
        current  = data['plate']
        if current is previous:
            datalist.remove(data)
        previous = current 

    return datalist

datalist = [

    {
        'plate': "01",
        'confidence' : "80"
    },

    {
        'plate': "01",
        'confidence' : "60"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },
]


print (filter(datalist))

This gives me the output:

[

    {
        'plate': "01",
        'confidence' : "80"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },

    {
        'plate': "02",
        'confidence' : "91"
    },
]

which is not expected, what's wrong with my code.

Khaalidi
  • 138
  • 10
  • 1
    Related, not exact duplicate as here we only want to consider one key when considering duplicates: [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) – jpp Jan 04 '19 at 13:04
  • 1
    you can also use pandas import pandas as pd; df = pd.DataFrame(data = datalist); df.drop_duplicates(subset = ['plate'],keep='first',inplace=True); output = df.to_dict(orient='record') – LMSharma Jan 04 '19 at 13:08

7 Answers7

6

If any element from the groups of duplicates is acceptable, you could do:

datalist = [
    {'plate': "01", 'confidence': "80"},
    {'plate': "01", 'confidence': "60"},
    {'plate': "02", 'confidence': "91"},
    {'plate': "02", 'confidence': "91"},
]

result = list({ d['plate'] : d for d in datalist }.values())
print(result)

Output

[{'plate': '02', 'confidence': '91'}, {'plate': '01', 'confidence': '60'}]

The idea is to create a dictionary where the keys are values of plate and the values are the dictionaries themselves. If you want to keep the first duplicate entries use reversed:

result = list({d['plate']: d for d in reversed(datalist)}.values())

Output

[{'plate': '02', 'confidence': '91'}, {'plate': '01', 'confidence': '80'}]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    This produces the wrong result because the last duplicated entries are kept, not the first. – timgeb Jan 04 '19 at 13:08
  • 1
    @timgeb OP didn't clearly specify that they want the first entry - only that they don't want duplicate entries. – Matthias Fischer Jan 04 '19 at 13:09
  • 1
    @MatthiasFischer OP specified the desired output. – timgeb Jan 04 '19 at 13:10
  • you can still do this by replacing `datalist` with `datalist[::-1]` or `reversed(datalist)` for something more loop-efficient. – Ma0 Jan 04 '19 at 13:10
  • @timgeb For the first example, but we don't know if they always want the first entry. You're probably right and I'm being pedantic, but it's not explicitly stated. :) – Matthias Fischer Jan 04 '19 at 13:14
  • 1
    @timgeb Updated the answer! – Dani Mesejo Jan 04 '19 at 13:15
  • @DanielMesejo Thank you so much for the quick response. It works the way I wanted. The only piece of puzzle left was to preserve the order, and for that I added another line of code, `results_sorted = sorted(results, key=lambda k: k['epoch_time'])` Fortunately I got the "Epoch Time" in the list , which and others few keys I removed for the clarity of the question. And I sorted the dictionaries with "epoch time" and it is fine now. – Khaalidi Jan 05 '19 at 19:42
3

Assuming you want to only keep the first duplicated dict found, You can use setdefault():

datalist = [
    {"plate": "01", "confidence": "80"},
    {"plate": "01", "confidence": "60"},
    {"plate": "02", "confidence": "91"},
    {"plate": "02", "confidence": "91"},
]

result = {}
for d in datalist:
    result.setdefault(d["plate"], d)

print(list(result.values()))
# [{'plate': '01', 'confidence': '80'}, {'plate': '02', 'confidence': '91'}]

If you instead want the last duplicates, simply iterate in reverse().

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
3

You can use the unique_everseen recipe, also available in 3rd party more_itertools:

from more_itertools import unique_everseen
from operator import itemgetter    

datalist = list(unique_everseen(datalist, key=itemgetter('plate')))

Internally, this solution uses set to keep track of seen plates, yielding only dictionaries with new plate values. Therefore, ordering is maintained and only the first instance of any given plate is kept.

jpp
  • 159,742
  • 34
  • 281
  • 339
3

If you are a pandas user, you can consider

>>> import pandas as pd
>>> datalist = [{'plate': "01", 'confidence': "80"}, {'plate': "01", 'confidence': "60"}, {'plate': "02", 'confidence': "91"}, {'plate': "02", 'confidence': "91"}]
>>> pd.DataFrame(datalist).drop_duplicates('plate').to_dict(orient='records')                                                                               
[{'confidence': '80', 'plate': '01'}, {'confidence': '91', 'plate': '02'}]

If you want to keep the last seen duplicates, pass keep='last'.

>>> pd.DataFrame(datalist).drop_duplicates('plate', keep='last').to_dict(orient='records')
[{'confidence': '60', 'plate': '01'}, {'confidence': '91', 'plate': '02'}]
timgeb
  • 76,762
  • 20
  • 123
  • 145
3

you can also use pandas

import pandas as pd
df = pd.DataFrame(data = datalist)
df.drop_duplicates(subset = ['plate'],keep='first',inplace=True)
output = df.to_dict(orient='record')

keep = 'first' or 'last' will help in which entry to keep in output

LMSharma
  • 279
  • 3
  • 10
  • Huh, I did not know that you could pass `'record'` instead of `'records'`. This seems to be undocumented. +1 for obscure knowledge :) – timgeb Jan 04 '19 at 13:23
  • 1
    Update: seems like you can pass `'r'`, `'re'`, ..., `'records'`, `'recordsasdf'`, ... – timgeb Jan 04 '19 at 13:25
  • 1
    :D Thanks @timgeb . We can use anything that starts with 'r' . I have just checked the source code. http://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/frame.py#L987-L1102 – LMSharma Jan 04 '19 at 14:12
2

You can use one groupby:

list(map(lambda x: next(x[1]), groupby(sorted(datalist, key=lambda d: d['plate']), lambda d: d['plate'])))

Results:

[{'plate': '01', 'confidence': '80'}, {'plate': '02', 'confidence': '91'}]
hamza tuna
  • 1,467
  • 1
  • 12
  • 17
2

Good old verbose for loop, then list comprehension:

tmp=[]
for dct in datalist:
  if not any(e[0] == dct["plate"] for e in tmp):
    tmp.append((dct["plate"], dct["confidence"]))


[ {"plate": plate, "confidence": confidence} for plate, confidence in tmp ]
#=> [{'plate': '01', 'confidence': '80'}, {'plate': '02', 'confidence': '91'}]
iGian
  • 11,023
  • 3
  • 21
  • 36