-1

How to remove a dict in the results' list if the model, url and price_int are the same (duplicates) ? The JSON sample:

[{
    "id": 1,
    "results": [
        {
            "model": "Audi Audi TT Roadster",
            "price_int": 2200,
            "rzc_result_url": "https://url1.jpg"
        },
        {
            "model": "Audi TT Roadster 1.8 T",
            "price_int": 2999,
            "rzc_result_url": "https://url1.jpg"
        },
        {
            "model": "Audi TT Roadster 1.8 T",
            "price_int": 2999,
            "rzc_result_url": "https://url1.jpg"
        }]
},
...

]

Expected output:

[{
    "id": 1,
    "results": [
        {
            "model": "Audi Audi TT Roadster",
            "price_int": 2200,
            "rzc_result_url": "https://url1.jpg"
        },
        {
            "model": "Audi TT Roadster 1.8 T",
            "price_int": 2999,
            "rzc_result_url": "https://url1.jpg"
        }]
},
...

]

Code:

def removeDoubles():
    results = item["results"]
    if not results == []:
        for result in results:
            urlList = result["url"]
            modelList = result["model"]
            priceIntList = result["price_int"]
            ... What to do ?
removeDoubles()

I know I'm far from a solution but how to remove the duplicate based on the three keys/values ?

lf_celine
  • 653
  • 7
  • 19
  • You have `price_str` and `price_int` keys. Should they be interpreted equally? They are both integers – hurlenko Mar 17 '20 at 15:47
  • I'm sorry it was an error. Yes all keys are `price_int` ones, I edited the question. – lf_celine Mar 17 '20 at 15:49
  • You can refer to this post if it helps https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python – Abhishek Kulkarni Mar 17 '20 at 15:51
  • 1
    Does this answer your question? [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) – stovfl Mar 17 '20 at 15:53

3 Answers3

1

You can compare dicts directly to check if they have the same keys/values.

from pprint import pprint
data = [
    {
        "id": 1,
        "results": [
            {
                "model": "Audi Audi TT Roadster",
                "price_int": 2200,
                "rzc_result_url": "https://url1.jpg",
            },
            {
                "model": "Audi TT Roadster 1.8 T",
                "price_int": 2999,
                "rzc_result_url": "https://url1.jpg",
            },
            {
                "rzc_result_url": "https://url1.jpg",
                "model": "Audi TT Roadster 1.8 T",
                "price_int": 2999,
            },
        ],
    },
]

for item in data:
    item['results'] = [result for i, result in enumerate(item['results']) if result not in item['results'][i + 1:]]

pprint(data)

Prints:

[{'id': 1,
  'results': [{'model': 'Audi Audi TT Roadster',
               'price_int': 2200,
               'rzc_result_url': 'https://url1.jpg'},
              {'model': 'Audi TT Roadster 1.8 T',
               'price_int': 2999,
               'rzc_result_url': 'https://url1.jpg'}]}]
hurlenko
  • 1,363
  • 2
  • 12
  • 17
0

The usual way to remove duplicates, when we don't care about order, is to put them in a set.

However, we can't put dicts into a set as-is, because they aren't hashable. We can use the trick given there, to preserve the dict data in a hashable form (which allows the set to remove duplicates naturally), and then get the original data back.

def remove_duplicates(dicts):
    # A set made from the hashable equivalent of each dict.
    unique = {frozenset(d.items()) for d in dicts}
    # Now we go backwards, building a list from the dict equivalents.
    return [dict(hashable) for hashable in unique]
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
0

You need to iterate over the input 'result' key values and do a membership check before adding to another list. So essentially you are not adding to new list if already a same is added before:

s = []
for x in lst:
    for r in x['results']:
        if r not in s:
            s.append(r)
    x['results'] = s
Austin
  • 25,759
  • 4
  • 25
  • 48