Removing duplicated dictionaries from list

Question

I have list with dictionaries with same keys and different values, but sometimes could be duplicates:

[{'colorName': u'red',
  'color_thumb': [],
  'main_zoom_picture': u'webcontent/0007/991/393/cn7991393.jpg',
  'pic_uris': [(u'S', u'webcontent/0007/991/248/cn.jpg')],
  'swatch_image_path': u'webcontent/0007/991/248/cn7991248.jpg'},
 {'colorName': u'red',
  'color_thumb': [],
  'main_zoom_picture': u'webcontent/0007/991/393/cn7991393.jpg',
  'pic_uris': [(u'S', u'webcontent/0007/991/248/cn.jpg')],
  'swatch_image_path': u'webcontent/0007/991/248/cn7991248.jpg'}]

I'm doing:

[dict(tupleized) for tupleized in set(tuple(item.items()) for item in shared_list)]

And receiving:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

score 4 · Accepted Answer · edited May 23 '17 at 12:04

4

This does the trick on your set of data for me.

from itertools import groupby

print [k for k,v in groupby(sorted(shared_list))]

Taken from this question

edited May 23 '17 at 12:04

Community

1
1

answered Aug 01 '14 at 11:51

Christoph Hegemann

1,434
8
13

Breaks on Python 3, though, since dicts are no longer comparable. – user2357112 Aug 01 '14 at 12:46
@Cristoph Hegemann: Is the a way to print also the duplicate dicts? – JavaSa Sep 26 '17 at 21:06

Martijn Pieters · Answer 2 · 2014-08-01T11:53:24.803

You have your loops inverted; you need to loop over shared_list *first:

[dict(tupleized) for item in shared_list for tupleized in set(tuple(item.items()))]

A list comprehension lists the loops in nesting order; left is outermost.

Next problem is that your values contain lists, these cannot be added to a set unaltered.

Next is that you need to use the set externally to the loop to test if a dictionary has been seen before:

def immutable_repr(d):
    return tuple((k, tuple(v)) if isinstance(v, list) else v
                 for k, v in sorted(d.items()))

seen = set()
[d for d in shared_list if immutable_repr(d) not in seen and not seen.add(immutable_repr(d))]

Here immutable_repr() takes care of producing an immutable tuple from each dictionary:

>>> immutable_repr(shared_list[0])
(u'red', ('color_thumb', ()), u'webcontent/0007/991/393/cn7991393.jpg', ('pic_uris', ((u'S', u'webcontent/0007/991/248/cn.jpg'),)), u'webcontent/0007/991/248/cn7991248.jpg')

The sorting ensures that even for dictionaries with a different key-order (which can alter based on the insertion and deletion history of the dictionary) the test for having seen it still works.

and seen is used to track which ones have been seen so far, to filter out any subsequent duplicates:

>>> from pprint import pprint
>>> seen = set()
>>> pprint([d for d in shared_list if immutable_repr(d) not in seen and not seen.add(immutable_repr(d))])
[{'colorName': u'red',
  'color_thumb': [],
  'main_zoom_picture': u'webcontent/0007/991/393/cn7991393.jpg',
  'pic_uris': [(u'S', u'webcontent/0007/991/248/cn.jpg')],
  'swatch_image_path': u'webcontent/0007/991/248/cn7991248.jpg'}]

You're still going to run into the problem of lists as values. — user2357112, Aug 01 '14 at 11:37
@user3761151: there are more errors to deal with here, just a mo. Your whole concept doesn't work. — Martijn Pieters, Aug 01 '14 at 11:45

rainman · Answer 3 · 2014-08-01T23:03:06.163

Martijin Pieters' explanation is correct about your error in the list comprehension. The items() returned for each dict in the list will be a list. In other words, a set cannot hash a list of lists.

However, you can store a tuple of tuples within a set. So you can make following change to your line of code.

>>> [dict(tupleized) for tupleized in set([tuple(tup for tup in item.items()) for item in shared_list]

Also, Christoph Hegemann's answer is very elegant. If you have the time and inclination, please check itertools (I discovered it recently and it's great to use).

My apologies to Christoph, because I would up-vote your answers but I just became a community member recently. So I have no rep :(

Removing duplicated dictionaries from list

3 Answers3