remove similar dictionary's from list in python

Question

I have a list of dictionary's that are similar but not completely identical and I want to keep one of them

example:

my_list = [
{"name" : "A","id" : 2,"value" : 279},
{"name" : "A","id" : 3,"value" : 463},
{"name" : "B","id" : 8,"value" : 508},
{"name" : "A","id" : 2,"value" : 647},
{"name" : "A","id" : 2,"value" : 969},
{"name" : "C","id" : 5,"value" : 384}]

I want to remove the dictionary's that share "name" and "id" but keep the one with higher "value

example of what I want it to be like

my_list = [
{"name" : "A","id" : 3,"value" : 463},
{"name" : "B","id" : 8,"value" : 508},
{"name" : "A","id" : 2,"value" : 969},
{"name" : "C","id" : 5,"value" : 384}]

the values that got removed are

{"name" : "A","id" : 2,"value" : 279},
{"name" : "A","id" : 2,"value" : 647}

because {"name" : "A","id" : 2,"value" : 969} have more "value"

{"name" : "A","id" : 3,"value" : 463} didn't get removed because the "id" is different

how can i do that?

i tried looking at some questions like

How to remove duplicate elements of, list of dictionaries in python

Is the order of the resulting list important? – dawg Aug 14 '21 at 12:53 — dawg, Aug 14 '21 at 12:53

score 3 · Answer 1 · answered Aug 14 '21 at 12:29

Try:

my_list = [
    {"name": "A", "id": 2, "value": 279},
    {"name": "A", "id": 3, "value": 463},
    {"name": "B", "id": 8, "value": 508},
    {"name": "A", "id": 2, "value": 647},
    {"name": "A", "id": 2, "value": 969},
    {"name": "C", "id": 5, "value": 384},
]

out = {}
for d in sorted(my_list, key=lambda k: k["value"]):
    out[(d["name"], d["id"])] = d

print(list(out.values()))

Prints:

[
    {"name": "A", "id": 2, "value": 969},
    {"name": "C", "id": 5, "value": 384},
    {"name": "A", "id": 3, "value": 463},
    {"name": "B", "id": 8, "value": 508},
]

I knew this question would get a load of answers immediately! I definitely prefer this solution though: no extra imports, short, and easily understandable. — Robson, Aug 14 '21 at 12:35

score 2 · Answer 2 · answered Aug 14 '21 at 12:30

2

import itertools

my_list.sort(key=lambda d: (d["name"], d["id"], -d["value"]))

for _key, group in itertools.groupby(
  my_list,
  key=lambda d: (d["name"], d["id"])
):
  print(next(group))

answered Aug 14 '21 at 12:30

Alex Hall

34,833
5
57
89

score 2 · Answer 3 · answered Aug 14 '21 at 12:31

see below

from collections import defaultdict

my_list = [
    {"name": "A", "id": 2, "value": 279},
    {"name": "A", "id": 3, "value": 463},
    {"name": "B", "id": 8, "value": 508},
    {"name": "A", "id": 2, "value": 647},
    {"name": "A", "id": 2, "value": 969},
    {"name": "C", "id": 5, "value": 384}]

data = defaultdict(list)
for entry in my_list:
    data[entry['name'], entry["id"]].append(entry)
new_data = []
for k, v in data.items():
    new_data.append(max(v, key=lambda x: x['value']))
print(new_data)

output

[{'name': 'A', 'id': 2, 'value': 969}, {'name': 'A', 'id': 3, 'value': 463}, {'name': 'B', 'id': 8, 'value': 508}, {'name': 'C', 'id': 5, 'value': 384}]

score 1 · Answer 4 · answered Aug 14 '21 at 12:29

If you don't mind using pandas (a little overkill) you can create a DataFrame, sort by value and then drop duplicates on just name and id.

import pandas as pd

df = pd.DataFrame(my_list)
out_list = (
    df.sort_values("value", ascending=False)
    .drop_duplicates(["name", "id"], keep="first")
    .to_dict(orient="records")
)

Which outputs:

[{'name': 'A', 'id': 2, 'value': 969},
 {'name': 'B', 'id': 8, 'value': 508},
 {'name': 'A', 'id': 3, 'value': 463},
 {'name': 'C', 'id': 5, 'value': 384}]

eroot163pi · Answer 5 · 2021-08-14T13:43:53.493

First sort them such that all same (name, id) are adjacent, then from each group take the max value element

from itertools import groupby
from operator import itemgetter
my_list = [
{"name" : "A","id" : 2,"value" : 279},
{"name" : "A","id" : 3,"value" : 463},
{"name" : "B","id" : 8,"value" : 508},
{"name" : "A","id" : 2,"value" : 647},
{"name" : "A","id" : 2,"value" : 969},
{"name" : "C","id" : 5,"value" : 384}]
# sorting priority 1 key is name,priority 2 key is its id
my_list.sort(key=itemgetter('name', 'id'))

# from each name, id group take largest value element
[max(g, key=itemgetter('value')) for _, g in groupby(my_list, itemgetter('name', 'id'))]

remove similar dictionary's from list in python

5 Answers5