0

I have list of dictionaries, I want to remove duplicates from that list. How to do that ?

a = [
 {'dtstart': '2014-09-10T08:00:00',
  'end': datetime.datetime(2014, 9, 10, 9, 0),
  'location': 'Brady Auditorium, B-131',
  'partial_date': datetime.date(2014, 9, 10),
  'photo': 'http://tools.medicine.yale.edu/portal/stream?id=01a331e2-42be-4622-b072-0c42b55b436e&w=540&h=700',
  'start': datetime.datetime(2014, 9, 10, 8, 0),
  'stream': '01a331e2-42be-4622-b072-0c42b55b436e',
  'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"',
  'uid': '2d671415-c666-498f-a401-01652a08e4b3'},
 {'dtstart': '2014-09-10T08:00:00',
  'end': datetime.datetime(2014, 9, 10, 9, 0),
  'location': 'Brady Auditorium, B-131',
  'partial_date': datetime.date(2014, 9, 10),
  'photo': 'http://tools.medicine.yale.edu/portal/stream?id=ccf667b2-b5a0-464f-8797-66eb36b0bf6c&w=540&h=700',
  'start': datetime.datetime(2014, 9, 10, 8, 0),
  'stream': 'ccf667b2-b5a0-464f-8797-66eb36b0bf6c',
  'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"',
  'uid': '2d671415-c666-498f-a401-01652a08e4b3'}
]

What I have tried is ,

>>> [dict(t) for t in set([tuple(d.items()) for d in a])]

But still returning duplicate elements.

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
oyks
  • 11
  • 6

3 Answers3

3

Use a dictionary comprehension to create a dictionary with uid as keys and each dictionary as values. Then extract the values to return a list of unique dictionaries as keyed by uid.

>>> a=[{'end': datetime.datetime(2014, 9, 10, 9, 0), 'uid': '2d671415-c666-498f-a401-01652a08e4b3', 'stream': '01a331e2-42be-4622-b072-0c42b55b436e', 'photo': 'http://tools.medicine.yale.edu/portal/stream?id=01a331e2-42be-4622-b072-0c42b55b436e&w=540&h=700', 'partial_date': datetime.date(2014, 9, 10), 'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"', 'start': datetime.datetime(2014, 9, 10, 8, 0), 'location': 'Brady Auditorium, B-131', 'dtstart': '2014-09-10T08:00:00'}, {'end': datetime.datetime(2014, 9, 10, 9, 0), 'uid': '2d671415-c666-498f-a401-01652a08e4b3', 'stream': 'ccf667b2-b5a0-464f-8797-66eb36b0bf6c', 'photo': 'http://tools.medicine.yale.edu/portal/stream?id=ccf667b2-b5a0-464f-8797-66eb36b0bf6c&w=540&h=700', 'partial_date': datetime.date(2014, 9, 10), 'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"', 'start': datetime.datetime(2014, 9, 10, 8, 0), 'location': 'Brady Auditorium, B-131', 'dtstart': '2014-09-10T08:00:00'}]
>>> {d['uid']: d for d in a}.values()
[{'dtstart': '2014-09-10T08:00:00',
  'end': datetime.datetime(2014, 9, 10, 9, 0),
  'location': 'Brady Auditorium, B-131',
  'partial_date': datetime.date(2014, 9, 10),
  'photo': 'http://tools.medicine.yale.edu/portal/stream?id=ccf667b2-b5a0-464f-8797-66eb36b0bf6c&w=540&h=700',
  'start': datetime.datetime(2014, 9, 10, 8, 0),
  'stream': 'ccf667b2-b5a0-464f-8797-66eb36b0bf6c',
  'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"',
  'uid': '2d671415-c666-498f-a401-01652a08e4b3'}]
mhawke
  • 84,695
  • 9
  • 117
  • 138
3

Just try the following code:

{document['uid']: document for document in a}.values()

For every uuid you will get the latest document. If you're looking for the first entries, try this:

{document['uid']: document for document in a[::-1]}.values()
Vladimir
  • 9,913
  • 4
  • 26
  • 37
2

Try appending the uids to a temporary list and verify with present dictionary

import datetime

a=[{'end': datetime.datetime(2014, 9, 10, 9, 0), 'uid': '2d671415-c666-498f-a401-01652a08e4b3', 'stream': '01a331e2-42be-4622-b072-0c42b55b436e', 'photo': 'http://tools.medicine.yale.edu/portal/stream?id=01a331e2-42be-4622-b072-0c42b55b436e&w=540&h=700', 'partial_date': datetime.date(2014, 9, 10), 'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"', 'start': datetime.datetime(2014, 9, 10, 8, 0), 'location': 'Brady Auditorium, B-131', 'dtstart': '2014-09-10T08:00:00'},
   {'end': datetime.datetime(2014, 9, 10, 9, 0), 'uid': '2d671415-c666-498f-a401-01652a08e4b3', 'stream': 'ccf667b2-b5a0-464f-8797-66eb36b0bf6c', 'photo': 'http://tools.medicine.yale.edu/portal/stream?id=ccf667b2-b5a0-464f-8797-66eb36b0bf6c&w=540&h=700', 'partial_date': datetime.date(2014, 9, 10), 'summary': 'Clinical Neuroscience Grand Rounds: "The Mechanism of Impaired Consciousness of Absence Seizures"', 'start': datetime.datetime(2014, 9, 10, 8, 0), 'location': 'Brady Auditorium, B-131', 'dtstart': '2014-09-10T08:00:00'}]

uuids = set() # temperary set holds UID
final=[]

for i in a:
    if i['uid'] not in uuids:
        final.append(i)
        uuids.add(i['uid'])
print final
Vladimir
  • 9,913
  • 4
  • 26
  • 37
sundar nataraj
  • 8,524
  • 2
  • 34
  • 46