1

I have a list of dictionaries.

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-07_18-24-22", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]

I want to get the list with removed duplicate dictionary where id is same but "updated_at" is latest.

So, my final list will be:

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]

What will be the efficient method?

smack cherry
  • 471
  • 3
  • 7
Fahad Ahammed
  • 371
  • 1
  • 11
  • 23
  • 1
    Does this answer your question? [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) – hod Jan 22 '20 at 05:27

4 Answers4

2

You could use a dict for accumulating the items.

The dictionary can store the id as key and the list item as value. Only insert an item in the dictionary if an item with the same key doesn't exist; if it does compare the updated_at value and update the dictionary if needed.

def generate_new_list(my_list):
    counts = {}
    for d in my_list:
        item_id = d['id']
        if item_id in counts:
            if d['updated_at'] > counts[item_id]['updated_at']:
                counts[item_id] = d
        else:
            counts[item_id] = d

    return list(counts.values())

A few more notes:

  • if you want to keep the original ordering, either ensure you are using Python 3.7 (which guarantees dicts are ordered in insertion order) or use OrderedDict. With standard dict you'll have to pop the entry first as a replacement does not change the dict order (so each item will be output in the order its id was first seen), while ordereddict has special support for that use case (move_to_end).
  • you could also remove special cases by using dict.get and the "null object pattern":

    MISSING = {'updated_at': '0'} # pseudo-entry smaller than all possible
    def generate_new_list(my_list):
        counts = {}
        for d in my_list:
            if d['updated_at'] > counts.get(d['id'], MISSING):
                counts[d['id']] = d
    
        return list(counts.values())
    
  • a non-dict alternative (though one which very much does not conserve order) is to sort by (id, updated_by), group by id, then only keep the last entry. I don't think the stdlib provides for the last operation out of the box though (islice doesn't accept negative indices) so you'd either have to do that by hand or reify the sub-entries to a list first.
jignatius
  • 6,304
  • 2
  • 15
  • 30
0

One way of doing this would be to change the structure of dict.

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-07_18-24-22", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]

def getNewUpdated(myList):
    newList = {}
    for element in myList:
        if (element["id"] not in newList):
            newList[element["id"]] = element
        elif (element["updated_at"] >= newList[element["id"]]["updated_at"]):
            newList[element["id"]] = element
    return newList

print(getNewUpdated(my_list))

Here, we are restructuring the dict, so that "id" is the key and all elements are "values", and then iterate the list you provided to check if "id" already exists in newList, if it exists, then just update the same record (provided the update time is new), or else add new record.

Output is something like this:

{
 'i8Po': {'summary': 'Renewed', 'id': 'i8Po', 'updated_at': '2020-01-08_13-16-36'},
 'yT8h': {'summary': 'Deleted', 'id': 'yT8h', 'updated_at': '2020-01-13_18-24-05'},
 '7uYg': {'summary': 'Transferred', 'id': '7uYg', 'updated_at': '2020-01-18_23-37-19'},
 'UU7t': {'summary': 'Renewed', 'id': 'UU7t', 'updated_at': '2020-01-06_16-40-00'}
}
  • You could use datetime to compare date and setdefault(key, value) instead o if (element["id"] not in newList): – m0r7y Jan 22 '20 at 05:56
0

Two solutions, one using a dict and the other by sorting and grouping:

from itertools import groupby

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-07_18-24-22", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]


def newest_id(seq):
    """Keep id with most recent updated_at

    Return a list of kept items.
    """
    td = {}
    for e in seq:
        key = e['id']
        if key not in td or td[key]['updated_at'] < e['updated_at']:
            td[key] = e
    return list(td.values())


def newest_id2(seq):
    """Keep id with most recent updated_at

    Return a sorted list of kept items.
    """
    tl = sorted(seq, key=lambda e: (e['id'], e['updated_at']), reverse=True)
    return [next(g) for _, g in groupby(tl, key=lambda e: e['id'])]


res1 = newest_id(my_list)
res2 = newest_id2(my_list)

# Check result

res1.sort(key=lambda e: e['id'], reverse=True)
print(res1 == res2)

FredrikHedman
  • 1,223
  • 7
  • 14
0

Using pandas

import pandas as pd

df = pd.DataFrame(my_list)
df = df.sort_values(by="updated_at").drop_duplicates(subset=["id"], keep="last")

my_list = df.to_dict(orient="records")

Output:

[{'id': 'UU7t', 'summary': 'Renewed', 'updated_at': '2020-01-06_16-40-00'},
 {'id': 'i8Po', 'summary': 'Renewed', 'updated_at': '2020-01-08_13-16-36'},
 {'id': 'yT8h', 'summary': 'Deleted', 'updated_at': '2020-01-13_18-24-05'},
 {'id': '7uYg', 'summary': 'Transferred', 'updated_at': '2020-01-18_23-37-19'}]
Sociopath
  • 13,068
  • 19
  • 47
  • 75