263

Let's say I have a list of dictionaries:

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

How can I obtain a list of unique dictionaries (removing the duplicates)?

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

See How can I properly hash dictionaries with a common set of keys, for deduplication purposes? for in-depth, technical discussion of why the usual approach for deduplicating a list (explained at Removing duplicates in lists) does not work.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Limaaf
  • 3,296
  • 3
  • 18
  • 17
  • 6
    How extensive are these dictionaries? Do you need individual attribute checking to determine duplicates, or is checking a single value in them sufficient? – g.d.d.c Jun 18 '12 at 23:33
  • These dicts got 8 key:value pairs and the list got 200 dicts. They actually got an ID and it's safe for me to remove the dict from list if the ID value found is a duplicate. – Limaaf Jun 18 '12 at 23:37
  • Possible duplicate of [How to make values in list of dictionary unique?](http://stackoverflow.com/questions/31792680/how-to-make-values-in-list-of-dictionary-unique) – Abhijeet Feb 09 '17 at 01:54
  • 1
    [forzenset](https://docs.python.org/2/library/stdtypes.html#frozenset) is an effective option. [`set(frozenset(i.items()) for i in list)`](https://repl.it/Fcss/1) – Abhijeet Feb 09 '17 at 01:58

23 Answers23

356

So make a temporary dict with the key being the id. This filters out the duplicates. The values() of the dict will be the list

In Python2.7

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python3

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python2.5/2.6

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • @John La Rooy - how could one use the same to remove dictionarys from a list based on multiple attributes , tried this but seems not to work > {v['flight']['lon']['lat']: v for v in stream}.values() – Jorge Vidinha Sep 13 '15 at 10:04
  • 2
    @JorgeVidinha assuming each could be cast to str (or unicode), try this: `{str(v['flight'])+':'+str(v['lon'])+','+str(v['lat']): v for v in stream}.values()` This just creates a unique key based on your values. Like `'MH370:-21.474370,86.325589'` – whunterknight Dec 21 '16 at 21:25
  • 5
    @JorgeVidinha, you can use a tuple as the dictionary key `{(v['flight'], v['lon'], v['lat']): v for v in stream}.values()` – John La Rooy Dec 22 '16 at 03:10
  • note that this may alter the order of the dictionaries in the list! use `OrderedDict` from `collections` `list(OrderedDict((v['id'], v) for v in L).values())` or sort the resulting list if that works better for you – gevra Dec 05 '18 at 18:40
  • 2
    If you need all values considered and not just the ID you can use ```list({str(i):i for i in L}.values())``` Here we use str(i) to create a unique string that represents the dictionary which is used to filter the duplicates. – DelboyJay Jul 19 '19 at 14:43
  • @DelboyJay, dicts are unordered, so you'd need to use `str(sorted(i.items()))` – John La Rooy Jul 21 '19 at 19:32
  • 3
    This does not actually de-duplicate identical dictionaries (where dict1 == dict2 returns true). The solution only works if you have identified a key to compare. – Ernesto Apr 07 '20 at 18:51
  • 1
    Hey can someone explain what is actually happening here? I don't know. ```list({v['id']:v for v in L}.values())``` – I_am_learning_now Jul 17 '20 at 00:44
  • 2
    `v['id']:v for v in L` creates new dictionary with ids as keys, and whole dicts as values. By default, keys in dictionaries are unique, so if the dict with the same id is being added to this new dictionary, it overwrites previous dict with the same id. `.values()` returns a view object that displays a list of all the values in the dictionary - here a list of whole unique (by id) dicts. And `list(...)` just converts the `dict_values` object of returned view to simple Python `list`. – I.P. Dec 12 '21 at 13:01
122

The usual way to find just the common elements in a set is to use Python's set class. Just add all the elements to the set, then convert the set to a list, and bam the duplicates are gone.

The problem, of course, is that a set() can only contain hashable entries, and a dict is not hashable.

If I had this problem, my solution would be to convert each dict into a string that represents the dict, then add all the strings to a set() then read out the string values as a list() and convert back to dict.

A good representation of a dict in string form is JSON format. And Python has a built-in module for JSON (called json of course).

The remaining problem is that the elements in a dict are not ordered, and when Python converts the dict to a JSON string, you might get two JSON strings that represent equivalent dictionaries but are not identical strings. The easy solution is to pass the argument sort_keys=True when you call json.dumps().

EDIT: This solution was assuming that a given dict could have any part different. If we can assume that every dict with the same "id" value will match every other dict with the same "id" value, then this is overkill; @gnibbler's solution would be faster and easier.

EDIT: Now there is a comment from André Lima explicitly saying that if the ID is a duplicate, it's safe to assume that the whole dict is a duplicate. So this answer is overkill and I recommend @gnibbler's answer.

Marco Sulla
  • 15,299
  • 14
  • 65
  • 100
steveha
  • 74,789
  • 21
  • 92
  • 117
  • 4
    While overkill given the ID in this particular case, this is still an excellent answer! – Josh Werts Sep 03 '13 at 17:37
  • 16
    This helps me since my dictionary does not have a key, and is only uniquely identified by all of its entries. Thanks! – ericso Sep 24 '14 at 16:51
  • 1
    This solution works most of the time but there may performance issues with scaling up but the author I think knows this and therefore recommends the solution with "id". Performance concerns: This solution uses serializing to string and then deserializing ... serializing/deserializing is expensive computation and does not usually scale up well (number of items is n>1e6 or each dictionary contains >1e6 items or both) or if you have to execute this many times >1e6 or often. – Trevor Boyd Smith Nov 14 '19 at 13:37
  • 3
    Just as a short aside this solution illustrates a great canonical example of why you would want to design your solution... i.e. if you have an id that is unique... then you can efficiently access the data... if you are lazy and don't have an id then your data access is more expensive. – Trevor Boyd Smith Nov 14 '19 at 13:40
  • Implementation: ` output_lod = {json.dumps(d, sort_keys=True) for d in lod} output_lod = [json.loads(x) for x in output_lod] ` – Alec May 09 '20 at 19:58
  • `list(map(json.loads, set(map(lambda x: json.dumps(x, sort_keys=True), [{1:2}, {3:4}, {1:2}]))))` only problem with this solution is when keys are not strings. – Guru Vishnu Vardhan Reddy Dec 22 '21 at 04:02
81

In case the dictionaries are only uniquely identified by all items (ID is not available) you can use the answer using JSON. The following is an alternative that does not use JSON, and will work as long as all dictionary values are immutable

[dict(s) for s in set(frozenset(d.items()) for d in L)]
Sina
  • 1,888
  • 1
  • 17
  • 16
26

Here's a reasonably compact solution, though I suspect not particularly efficient (to put it mildly):

>>> ds = [{'id':1,'name':'john', 'age':34},
...       {'id':1,'name':'john', 'age':34},
...       {'id':2,'name':'hanna', 'age':30}
...       ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
Greg E.
  • 2,722
  • 1
  • 16
  • 22
  • 8
    Surround the `map()` call with `list()` in Python 3 to get a list back, otherwise it's a `map` object. – dmn Jun 21 '17 at 18:49
  • an additional benefit of this approach in python 3.6+ is that the list ordering is preserved – jnnnnn Oct 29 '19 at 05:15
  • @jnnnnn I'm using Python 3.8.6 and list ordering is not preserved! My list: `x=[{'a':15}, {'a':15}, {'b':30}]` Converting: `list(map(dict, set(tuple(sorted(i.items())) for i in x)))` which returns: `[{'b': 30}, {'a': 15}]` – Shayan Nov 15 '21 at 10:32
20

You can use numpy library (works for Python2.x only):

   import numpy as np 

   list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))

To get it worked with Python 3.x (and recent versions of numpy), you need to convert array of dicts to numpy array of strings, e.g.

list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))
bubble
  • 1,634
  • 12
  • 17
15
a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

b = {x['id']:x for x in a}.values()

print(b)

outputs:

[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Yusuf X
  • 14,513
  • 5
  • 35
  • 47
  • In the same example. how can I get the dicts containing only the similar IDs ? – user8162 Apr 10 '16 at 16:13
  • @user8162, what would you want the output to look like? – Yusuf X Apr 12 '16 at 14:20
  • Sometimes, I will have same ID, but different age. so output to be [{'age': [34, 40], 'id': 1, 'name': ['john', Peter]}]. In short, if IDs are same, then combine the contents of others to a list as I mentioned here. Thanks in advance. – user8162 Apr 12 '16 at 15:56
  • 2
    b = {x['id']:[y for y in a if y['id'] == x['id'] ] for x in a} is one way to group them together. – Yusuf X Apr 18 '16 at 08:07
8

Since the id is sufficient for detecting duplicates, and the id is hashable: run 'em through a dictionary that has the id as the key. The value for each key is the original dictionary.

deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()

In Python 3, values() doesn't return a list; you'll need to wrap the whole right-hand-side of that expression in list(), and you can write the meat of the expression more economically as a dict comprehension:

deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())

Note that the result likely will not be in the same order as the original. If that's a requirement, you could use a Collections.OrderedDict instead of a dict.

As an aside, it may make a good deal of sense to just keep the data in a dictionary that uses the id as key to begin with.

kindall
  • 178,883
  • 35
  • 278
  • 309
8

We can do with pandas

import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Notice slightly different from the accept answer.

drop_duplicates will check all column in pandas , if all same then the row will be dropped .

For example :

If we change the 2nd dict name from john to peter

L=[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'peter', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]: 
[{'age': 34, 'id': 1, 'name': 'john'},
 {'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put 
 {'age': 30, 'id': 2, 'name': 'hanna'}]
cs95
  • 379,657
  • 97
  • 704
  • 746
BENY
  • 317,841
  • 20
  • 164
  • 234
8

In python 3, simple trick, but based on unique field (id):

data = [ {'id': 1}, {'id': 1}]

list({ item['id'] : item for item in data}.values())
7

I have summarized my favorites to try out:

https://repl.it/@SmaMa/Python-List-of-unique-dictionaries

# ----------------------------------------------
# Setup
# ----------------------------------------------

myList = [
  {"id":"1", "lala": "value_1"},
  {"id": "2", "lala": "value_2"}, 
  {"id": "2", "lala": "value_2"}, 
  {"id": "3", "lala": "value_3"}
]
print("myList:", myList)

# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------

myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)

# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------

myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)

# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------

myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)
Sma Ma
  • 3,343
  • 2
  • 31
  • 39
6

There are a lot of answers here, so let me add another:

import json
from typing import List

def dedup_dicts(items: List[dict]):
    dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
    return dedupped

items = [
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)
monkut
  • 42,176
  • 24
  • 124
  • 155
4

I don't know if you only want the id of your dicts in the list to be unique, but if the goal is to have a set of dict where the unicity is on all keys' values.. you should use tuples key like this in your comprehension :

>>> L=[
...     {'id':1,'name':'john', 'age':34},
...    {'id':1,'name':'john', 'age':34}, 
...    {'id':2,'name':'hanna', 'age':30},
...    {'id':2,'name':'hanna', 'age':50}
...    ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3

Hope it helps you or another person having the concern....

nixmind
  • 2,060
  • 6
  • 32
  • 54
  • Similar with comprehensive answers above BUT, this is more generic and might provide full unique option. So this is upvoted. – asevindik Jan 13 '22 at 16:24
3

Expanding on John La Rooy (Python - List of unique dictionaries) answer, making it a bit more flexible:

def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
    return list({''.join(row[column] for column in columns): row
                for row in list_of_dicts}.values())

Calling Function:

sorted_list_of_dicts = dedup_dict_list(
    unsorted_list_of_dicts, ['id', 'name'])
Illegal Operator
  • 656
  • 6
  • 14
3

If there is not a unique id in the dictionaries, then I'd keep it simple and define a function as follows:

def unique(sequence):
    result = []
    for item in sequence:
        if item not in result:
            result.append(item)
    return result

The advantage with this approach, is that you can reuse this function for any comparable objects. It makes your code very readable, works in all modern versions of Python, preserves the order in the dictionaries, and is fast too compared to its alternatives.

>>> L = [
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 2, 'name': 'hanna', 'age': 30},
... ] 
>>> unique(L)
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}]
Michael
  • 637
  • 1
  • 7
  • 13
1

In python 3.6+ (what I've tested), just use:

import json

#Toy example, but will also work for your case 
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]

#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))

print(myListOfUniqueDicts)

Explanation: we're mapping the json.dumps to encode the dictionaries as json objects, which are immutable. set can then be used to produce an iterable of unique immutables. Finally, we convert back to our dictionary representation using json.loads. Note that initially, one must sort by keys to arrange the dictionaries in a unique form. This is valid for Python 3.6+ since dictionaries are ordered by default.

VanillaSpinIce
  • 263
  • 2
  • 10
  • 1
    Remember to sort the keys before dumping to JSON. You also don't need to convert to `list` before doing `set`. – Nathan Apr 06 '19 at 16:07
1

Well all the answers mentioned here are good, but in some answers one can face error if the dictionary items have nested list or dictionary, so I propose simple answer

a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]
PRAKHAR KAUSHIK
  • 113
  • 1
  • 8
1

Objects can fit into sets. You can work with objects instead of dicts and if needed after all set insertions convert back to a list of dicts. Example

class Person:
    def __init__(self, id, age, name):
        self.id = id
        self.age = age
        self.name = name

my_set = {Person(id=2, age=3, name='Jhon')}

my_set.add(Person(id=3, age=34, name='Guy'))

my_set.add({Person(id=2, age=3, name='Jhon')})

# if needed convert to list of dicts
list_of_dict = [{'id': obj.id,
                 'name': obj.name,
                 'age': obj.age} for obj in my_set]
juan Isaza
  • 3,646
  • 3
  • 31
  • 37
  • 1
    A shorter way to define Person: `Person = collections.namedtuple('Person', ['id', 'age', 'name'])` – darw Nov 04 '21 at 10:24
0

A quick-and-dirty solution is just by generating a new list.

sortedlist = []

for item in listwhichneedssorting:
    if item not in sortedlist:
        sortedlist.append(item)
lyzazel
  • 164
  • 1
  • 5
0

Let me add mine.

  1. sort target dict so that {'a' : 1, 'b': 2} and {'b': 2, 'a': 1} are not treated differently

  2. make it as json

  3. deduplicate via set (as set does not apply to dicts)

  4. again, turn it into dict via json.loads

import json

[json.loads(i) for i in set([json.dumps(i) for i in [dict(sorted(i.items())) for i in target_dict]])]
0

There may be more elegant solutions, but I thought it might be nice to add a more verbose solution to make it easier to follow. This assumes there is not a unique key, you have a simple k,v structure, and that you are using a version of python that guarantees list order. This would work for the original post.

data_set = [
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

# list of keys
keys = [k for k in data_set[0]]

# Create a List of Lists of the values from the data Set
data_set_list = [[v for v in v.values()] for v in data_set]

# Dedupe
new_data_set = []
for lst in data_set_list:
    # Check if list exists in new data set
    if lst in new_data_set:
        print(lst)
        continue
    # Add list to new data set
    new_data_set.append(lst)

# Create dicts
new_data_set = [dict(zip(keys,lst)) for lst in new_data_set]    

print(new_data_set)
TYPKRFT
  • 158
  • 3
  • 14
-1

Pretty straightforward option:

L = [
    {'id':1,'name':'john', 'age':34},
    {'id':1,'name':'john', 'age':34},
    {'id':2,'name':'hanna', 'age':30},
    ]


D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output
jedwards
  • 29,432
  • 3
  • 65
  • 92
-2

Heres an implementation with little memory overhead at the cost of not being as compact as the rest.

values = [ {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},
           {'id':1,'name':'john', 'age':34},
           {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
    if values[index]['id'] in count:
        del values[index]
    else:
        count[values[index]['id']] = 1
        index += 1

output:

[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
Samy Vilar
  • 10,800
  • 2
  • 39
  • 34
  • 1
    You need to test this a bit more. Modifying the list while you are iterating over it might not always work as you expect – John La Rooy Jun 19 '12 at 00:03
  • @gnibbler very good point! I'll delete the answer and test it more thoroughly. – Samy Vilar Jun 19 '12 at 00:05
  • Looks better. You can use a set to keep track of the ids instead of the dict. Consider starting the `index` at `len(values)` and counting backwards, that means that you can always decrement `index` whether you `del` or not. eg `for index in reversed(range(len(values))):` – John La Rooy Jun 19 '12 at 00:41
  • @gnibbler interesting, do sets have near constant look up like dictionaries? – Samy Vilar Jun 19 '12 at 00:51
-4

This is the solution I found:

usedID = []

x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

for each in x:
    if each['id'] in usedID:
        x.remove(each)
    else:
        usedID.append(each['id'])

print x

Basically you check if the ID is present in the list, if it is, delete the dictionary, if not, append the ID to the list

tabchas
  • 1,374
  • 2
  • 18
  • 37
  • I'd use a set rather than list for usedID. It's a faster lookup, and more readable – happydave Jun 18 '12 at 23:44
  • Yea i didnt know about sets... but I am learning... I was just looking at @gnibbler answer... – tabchas Jun 18 '12 at 23:46
  • 2
    You need to test this a bit more. Modifying the list while you are iterating over it might not always work as you expect – John La Rooy Jun 19 '12 at 00:05
  • Yea I don't understand why it doesn't work... Any ideas what I'm doing wrong? – tabchas Jun 19 '12 at 02:05
  • No I caught the problem... its just that I dont understand why its giving that problem... do you know? – tabchas Jun 19 '12 at 03:39
  • When you remove an item from the list, all the remaining items are moved down one place, so `each` never references the item following one that is removed – John La Rooy Jun 19 '12 at 03:43