0

I have the following dictionary list

dict_list = [{"feed_id": 101, "query_id": 201, "bind_id": 301, "qname":"q1"},                    
             {"feed_id": 101, "query_id": 201, "bind_id": 301, "qname":"q2"},                    
             {"feed_id": 103, "query_id": 201, "bind_id": 301, "qname":"q1"},                    
             {"feed_id": 103, "query_id": 202, "bind_id": 301, "qname":"q3"},                    
             {"feed_id": 103, "bind_id": 301, "bname": "b1"},                                     
             {"feed_id": 101, "query_id": 201, "qname":"q1"}]

I want to remove duplicates based on the combination of three keys. The result list should look like

result_dict_list = [{"feed_id": 101, "query_id": 201, "bind_id": 301, "qname":"q1"},            
                    {"feed_id": 103, "query_id": 201, "bind_id": 301, "qname":"q1"},            
                    {"feed_id": 103, "query_id": 202, "bind_id": 301, "qname":"q3"},            
                    {"feed_id": 103, "bind_id": 301, "bname": "b1"},                             
                    {"feed_id": 101, "query_id": 201, "qname":"q1"}]

These are the requirements in terms of object structure

  1. feed_id will exist for all objects and cannot be null
  2. query_id and bind_id are optional. If even one of the property does not exist, no need to check whether the object is duplicate
  3. All three properties values are numbers. If the key exist, value cannot be null.
  4. There can be many other properties in each object but to eliminate duplicates we care only about feed_id, query_id and bind_id
  5. Order of the list doesn't matter

What would be the most efficient way to remove duplicates from the list in python?

Thanks

Gowthaman
  • 1,262
  • 1
  • 9
  • 15

2 Answers2

2

It's fairly straightforward:

dict_list = [{"feed_id": 101, "query_id": 201, "bind_id": 301},                    
             {"feed_id": 101, "query_id": 201, "bind_id": 301},                    
             {"feed_id": 103, "query_id": 201, "bind_id": 301},                    
             {"feed_id": 103, "query_id": 202, "bind_id": 301},                    
             {"feed_id": 103, "bind_id": 301},                                     
             {"feed_id": 101, "query_id": 201}]
result_dict_list = []
for d in dict_list:
    if d not in result_dict_list:
        result_dict_list.append(d)

print(result_dict_list)
RMPR
  • 3,368
  • 4
  • 19
  • 31
0

I found a working solution for now. But wanted to check if there is a better way to do it. This is what I have.

list_with_all_keys = []
properties = ['feed_id', 'query_id', 'bind_id']

for obj in dict_list:
    if all(key in obj for key in properties):
        list_with_all_keys.append(obj)

dict_with_no_duplicates = {tuple(d[k] for k in properties): d 
                                      for d in list_with_all_keys}

list_with_missing_keys = [i for i in dict_list if i not in list_with_all_keys]

result_dict = dict_with_no_duplicates.values() + list_with_missing_keys

Pyfiddle for reference https://pyfiddle.io/fiddle/268c14a1-ccd9-471c-8c61-18ae90d81f47/?i=true

Gowthaman
  • 1,262
  • 1
  • 9
  • 15