python remove duplications entirely using two keys

Question

Given a list which contains dictionaries that every dictionary has A, B and C keys I'm looking to delete duplications (All including the original too) from that set according to only A & C keys. for example: given the following:

set=[{'A':1,'B':4,:'C':2},{'A':5,'B':6,'C':0},{'A':1,'B':5,'C':2},{'A':6,'B':1,'C':9}]

I'm expecting

set=[{'A':5,'B':6,'C':0},{'A':6,'B':1,'C':9}]

Does this answer your question? [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) — bherbruck, May 13 '20 at 01:38
not exactly since I want to choose specific keys, plus I'm looking to delete all duplications including the source — , May 13 '20 at 01:41
Welcome to SO. This isn't a discussion forum or tutorial. Please take the [tour] and take the time to read [ask] and the other links found on that page. Invest some time with [the Tutorial](https://docs.python.org/3/tutorial/index.html) practicing the examples. It will give you an idea of the tools Python offers to help you solve your problem. — wwii, May 13 '20 at 02:47

score 4 · Answer 1 · answered May 13 '20 at 01:52

One way of achieving the result is to convert your list into dataframe and then use drop_duplicates to drop duplicate rows and then convert back to a list of dictionaries.

In [33]: set1=[{'A':1,'B':4,'C':2},{'A':5,'B':6,'C':0},{'A':1,'B':5,'C':2},{'A':6,'B':1,'C':9}]

In [34]: set1
Out[34]:
[{'A': 1, 'B': 4, 'C': 2},
 {'A': 5, 'B': 6, 'C': 0},
 {'A': 1, 'B': 5, 'C': 2},
 {'A': 6, 'B': 1, 'C': 9}]

In [35]: df = pd.DataFrame(set1)

In [36]: df
Out[36]:
   A  B  C
0  1  4  2
1  5  6  0
2  1  5  2
3  6  1  9

In [38]: df.drop_duplicates(subset=['A','C'],keep=False,inplace=True)

In [39]: df
Out[39]:
   A  B  C
1  5  6  0
3  6  1  9

In [40]: df.to_dict(orient='records')
Out[40]: [{'A': 5, 'B': 6, 'C': 0}, {'A': 6, 'B': 1, 'C': 9}]

‘pd’ is à common abbreviation for pandas package. Simply add ‘import pandas as pd’. — pyOliv, May 13 '20 at 04:30
You have to import pandas as @pyoliv mentioned. Use import pandas as pd. — Rajat Mishra, May 13 '20 at 11:29

score 0 · Accepted Answer · answered May 13 '20 at 01:56

This should work for you although it may not be the fastest possible solution since it iterates through the list twice.

Input:

list_in = [{'A':1,'B':4, 'C':2},{'A':5,'B':6,'C':0},{'A':1,'B':5,'C':2},{'A':6,'B':1,'C':9}]
seen = set()
dups = set()
for dict_in in list_in:
    if (dict_in['A'], dict_in['C']) in seen:
        dups.add((dict_in['A'], dict_in['C']))
    else:
        seen.add((dict_in['A'], dict_in['C']))

list_out = [dict_in for dict_in in list_in if (dict_in['A'], dict_in['C']) not in dups]
print(list_out)

Output:

[{'A': 5, 'B': 6, 'C': 0}, {'A': 6, 'B': 1, 'C': 9}]

python remove duplications entirely using two keys

2 Answers2