1

I have a list of dictionaries, where some "term" values are repeated:

terms_dict = [{'term': 'potato', 'cui': '123AB'}, {'term': 'carrot', 'cui': '222AB'}, {'term': 'potato', 'cui': '456AB'}]

As you can see the term 'potato' value appears more than once. I would like to store this 'term' for future reference as a variable. Then, remove all of those repeated terms from the terms_dict, leaving only the term 'carrot' dictionary in the list.

Desired output:

repeated_terms = ['potato'] ## identified and stored terms that are repeated in terms_dict. 

new_terms_dict = [{'term': 'carrot', 'cui': '222AB'}] ## new dict with the unique term.

Idea:

I can certainly create a new dictionary with unique terms, however, I am stuck with actually identifying the "term" that is repeated and storing it in a list.

Is there a pythonic way of finding/printing/storing the repeated values ?

blah
  • 674
  • 3
  • 17
  • 1
    Step 1: make a list of all "terms". Step 2: https://stackoverflow.com/questions/9835762/how-do-i-find-the-duplicates-in-a-list-and-create-another-list-with-them/9835819 Step 3: remove dictionaries with terms found in step 2: https://stackoverflow.com/questions/7623715/deleting-list-elements-based-on-condition – mkrieger1 Aug 26 '21 at 17:24

2 Answers2

2

You can use collections.Counter for the task:

from collections import Counter

terms_dict = [
    {"term": "potato", "cui": "123AB"},
    {"term": "carrot", "cui": "222AB"},
    {"term": "potato", "cui": "456AB"},
]

c = Counter(d["term"] for d in terms_dict)

repeated_terms = [k for k, v in c.items() if v > 1]
new_terms_dict = [d for d in terms_dict if c[d["term"]] == 1]

print(repeated_terms)
print(new_terms_dict)

Prints:

['potato']
[{'term': 'carrot', 'cui': '222AB'}]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

You can use drop_duplicates and duplicated from pandas:

>>> import pandas as pd
>>> df = pd.DataFrame(terms_dict)
>>> df.term[df.term.duplicated()].tolist() # repeats
['potato']
>>> df.drop_duplicates('term', keep=False).to_dict('records') # without repeats
[{'term': 'carrot', 'cui': '222AB'}]
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52