-1

Given a list which contains dictionaries that every dictionary has A, B and C keys I'm looking to delete duplications (All including the original too) from that set according to only A & C keys. for example: given the following:

set=[{'A':1,'B':4,:'C':2},{'A':5,'B':6,'C':0},{'A':1,'B':5,'C':2},{'A':6,'B':1,'C':9}]

I'm expecting

set=[{'A':5,'B':6,'C':0},{'A':6,'B':1,'C':9}]
  • Does this answer your question? [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) – bherbruck May 13 '20 at 01:38
  • not exactly since I want to choose specific keys, plus I'm looking to delete all duplications including the source –  May 13 '20 at 01:41
  • 1
    why @Rajat Mishra answer's not accepted? – Je Je May 13 '20 at 02:04
  • Welcome to SO. This isn't a discussion forum or tutorial. Please take the [tour] and take the time to read [ask] and the other links found on that page. Invest some time with [the Tutorial](https://docs.python.org/3/tutorial/index.html) practicing the examples. It will give you an idea of the tools Python offers to help you solve your problem. – wwii May 13 '20 at 02:47
  • @NonoLondon it doesn't work read my comment –  May 13 '20 at 03:17

2 Answers2

4

One way of achieving the result is to convert your list into dataframe and then use drop_duplicates to drop duplicate rows and then convert back to a list of dictionaries.

In [33]: set1=[{'A':1,'B':4,'C':2},{'A':5,'B':6,'C':0},{'A':1,'B':5,'C':2},{'A':6,'B':1,'C':9}]

In [34]: set1
Out[34]:
[{'A': 1, 'B': 4, 'C': 2},
 {'A': 5, 'B': 6, 'C': 0},
 {'A': 1, 'B': 5, 'C': 2},
 {'A': 6, 'B': 1, 'C': 9}]

In [35]: df = pd.DataFrame(set1)

In [36]: df
Out[36]:
   A  B  C
0  1  4  2
1  5  6  0
2  1  5  2
3  6  1  9

In [38]: df.drop_duplicates(subset=['A','C'],keep=False,inplace=True)

In [39]: df
Out[39]:
   A  B  C
1  5  6  0
3  6  1  9

In [40]: df.to_dict(orient='records')
Out[40]: [{'A': 5, 'B': 6, 'C': 0}, {'A': 6, 'B': 1, 'C': 9}]
Rajat Mishra
  • 3,635
  • 4
  • 27
  • 41
0

This should work for you although it may not be the fastest possible solution since it iterates through the list twice.

Input:

list_in = [{'A':1,'B':4, 'C':2},{'A':5,'B':6,'C':0},{'A':1,'B':5,'C':2},{'A':6,'B':1,'C':9}]
seen = set()
dups = set()
for dict_in in list_in:
    if (dict_in['A'], dict_in['C']) in seen:
        dups.add((dict_in['A'], dict_in['C']))
    else:
        seen.add((dict_in['A'], dict_in['C']))

list_out = [dict_in for dict_in in list_in if (dict_in['A'], dict_in['C']) not in dups]
print(list_out)

Output:

[{'A': 5, 'B': 6, 'C': 0}, {'A': 6, 'B': 1, 'C': 9}]
Noah Smith
  • 187
  • 4